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manipulating visual display pixel resolution and Gaussian white noise level and by manipulating auditory display 
sampling frequency and Gaussian white noise level. 

Statistically significant results indicate that 1) medium or high- -quality auditory displays coupled with high-quality 
visual displays increase the quality perception of the visual displays relative to the evaluation of the visual display alone. 
and 2) low-quality auditory displays coupled with high-quality visual displays decrease the quality perception of the 
auditory displays relative to the evaluation of the auditory display alone. These findings strongly suggest that the quality 
of realism in virtual environments must be a function of both auditory and visual display fidelities inclusive of each other. 
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The quality of realism in virtual environments is typically considered to be a 
function of visual and audio fidelity mutually exclusive of each other. However, the 
virtual environment participant, being human, is multi-modal by nature. Therefore. in 
order to more accurately validate the levels of auditory and visual fidelity required ina 
virtual environment, a better understanding is needed of the intersensory or cross-modal 
effects between the auditory and visual sense modalities. 

To identify whether any pertinent auditory-visual cross-modal perception 
phenomena exist, 108 subjects participated in three main experiments which were 
completely automated using HTML, Java, and JavaScript computer programming 
languages. Visual and auditory display quality perception were measured intramodally 
and intermodally by manipulating visual display pixel resolution and Gaussian white 
noise level and by manipulating auditory display sampling frequency and Gaussian white 
noise level. | 

Statistically significant results indicate that 1) medium or high-quality auditory 
displays coupled with high-quality visual displays increase the quality perception of the 
visual displays relative to the evaluation of the visual display alone, and 2) low-quality 
auditory displays coupled with high-quality visual displays decrease the quality 
perception of the auditory displays relative to the evaluation of the auditory display alone. 
These findings strongly suggest that the quality of realism in virtual environments must 


be a function of both auditory and visual display fidelities inclusive of each other. 
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Il. INTRODUCTION 


A. MOTIVATION 


The fidelity requirements for virtual environments have traditionally focused on 
the singular modality of vision. As a result, in an attempt to render visual displays as 
close as possible to the fidelity of the human visual system, the fidelity of visual display 
systems has increased dramatically in the last ten years. Likewise, as a result of better 
audio technology, there has been a recent surge of emphasis on the fidelity requirements 
concerning the singular modality of audition. As a result, the fidelity of auditory display 
systems has increased dramatically in the last five years. These rapid advances in visual 
and auditory display technologies have helped to create increasingly realistic virtual 
environments. The quality of realism in these virtual environments is typically considered 
to be a function of visual and audio fidelity mutually exclusive of each other [BARF95]. 
Herein lies a problem: the virtual environment participant, being human, is multi-modal 
by nature. Thus, the quality of realism in virtual environments needs to be based on 
multi-modal criteria comprising all of our senses, as opposed to the current use of 
singular modality criteria. As such, the fidelity requirement of virtual environments must 
be based on multi-modal criteria comprising all of our senses. However, insufficient 


experimental data exists to make informed multi-modal design decisions. 


B. OBJECTIVE 


Because of current limitations in today’s computer technology, it is impossible to 
render realistic information to all of our senses in real-time to the interactive virtual 
environment participant. However, since there have been significant advances in visual 
and audio display technology, it is appropriate to concentrate on the vision and audition 
sensory modalities. As such, the objective of this research effort correspondingly focuses 
on the two sensory modalities of vision and audition. In particular, the objective of this 


effort is to gain a better understanding of the intersensory or cross-modal effects between 


the auditory and visual sense modalities. By gaining a better understanding of auditory- 
visual cross-modal effects, system designers can more accurately verify and validate the 
levels of auditory and visual fidelity required for the immersed virtual environment 


participant. 
C. SCOPE 


Intersensory phenomena have been studied for many years by researchers in 
numerous disciplines such as: Psychoacoustics, Psychology, Physiology, Neurology, 
Philosophy, Musicology, Ecology, and Computer-Human Interaction, and by different 
organizations such as: Human Factors, Audio Enomecnne Society, Acoustical Society of 
America, Department of Defense, Artistic Community, and also the Film and 
Entertainment Industry. Thus, there is a large amount of intersensory research, but this 
knowledge is often kept within the discipline from which it was derived. Consequently, 
there is little cross-disciplinary transfer of intersensory knowledge. This lack of cross- 
disciplinary knowledge exists not only with intersensory research, but also seems to 
extend to many areas of academic and commercial interests. This is a pity, for there are 
no doubt countless examples of redundant research efforts all because of a lack of cross- 
disciplinary knowledge exchange. Nevertheless, in terms of modeling and simulation, the 
National Research Council (NRC) has recently investigated the possible collaboration 
opportunities between the Department of Defense and the Entertainment Industry 
[ZYDA97]. This collaboration is a much needed first step towards better cross- 
disciplinary knowledge transfer. 

Computer Science in particular is severely lacking in its knowledge and use of 
intersensory phenomena. Therefore, it is important to note that the scope of this effort is 
filtered through the perspective of a computer scientist for use by other computer 
scientists. The results of this effort are intended to aid the computer scientist in 
developing better virtual worlds through appropriate use of auditory and visual display 
fidelities based on auditory-visual cross-modal perception phenomena. It 1s also 


important to note that the scope of this effort is not to identify absolute visual and/or 


audio fidelity requirements such as pixel resolution and sampling frequency respectively, 
but rather to identify the effects of auditory-visual cross-modal perception phenomena 


which can be used to justify a certain level of audio and/or visual fidelity. 


D. APPROACH 


The approach taken 1s that of the experimental psychologist. A series of 
experiments were designed to identify if there exists any pertinent auditory-visual cross- 
modal perception interactions. Specifically, one pilot study and three main experiments 
were conducted. Each of the three main experiments was completely automated using 
Hyper Text Markup Language (HTML), Java, and JavaScript [FLAN96] [LADD98]. The 
pilot study was also completely automated but was developed using Virtual Reality 
Modeling Language (VRML) [HART96] [LEAR96] [ROEH97]. All experiments were 
conducted at the Naval Postgraduate School (NPS) in Monterey, California. A total of 
130 volunteer participants comprised from the students, faculty, staff, and guests of NPS 
served as subjects. Each experiment involved a 3x3 factorial within subjects design. (See 
[GOOD95] for a aecenintion of factorial design experiments.) The two independent 
variables were visual and audio display quality having three levels each consisting of 
low, medium, and high qualities. The visual display parameters that were manipulated 
were pixel resolution and Gaussian white noise level. The audio display parameters that 
were manipulated were sampling frequency and Gaussian white noise level. Partial 
counterbalancing was achieved through the technique of balanced Latin squares. (See 
[GOOD95] for a description of the Latin squares technique.) The basic idea of the 
experiments was to manipulate visual and auditory display parameters intra-modally and 
inter-modally and to likewise measure visual and auditory display perception intra- 
modally and amtersinodally: During the experiments, which each lasted approximately 30 
minutes, a single subject wore headphones and sat in front of a 20-inch display monitor. 
The task of the subject was to rate the perceived quality of audio-only, visual-only, and 
audio-visual displays through Likert rating scales ranging from | to 7. (See [GOOD95] 


for a description of Likert rating scales.) Thus, the dependent variables are the perception 


of visual display quality and the perception of auditory display quality. [t is hoped that by 
carefully varying the fidelity of both auditory and visual displays, it will be possible to 
measure auditory-visual cross-modal perception interactions. Specifically, this effort aims 
to answer the following question: in an audio-visual display, what affect (if any) do 
various audio quality levels have on the perception of visual quality and vice versa? The 
following are Some examples: 

1) Are changes in the audio and/or visual qualities of an audio-visual display 


perceivable and can these changes be attended to also? 


2) Does a high-quality auditory display coupled with a low-quality visual display 
cause a decrease/increase in the perception of audio quality and/or an increase/decrease in 
the perception of visual quality relative to established baseline conditions derived from 
auditory-only and visual-only quality perception evaluations? 


3) Does a low-quality auditory display coupled with a high-quality visual display 
cause an increase/decrease in the perception of audio quality and/or a decrease/increase in 
the perception of visual quality relative to established baseline conditions derived from 
auditory-only and visual-only quality perception evaluations? 


4) Does a low-quality auditory display coupled with a low-quality visual display 
cause a decrease/increase in the perception of audio quality and/or a decrease/increase in 
the perception of visual quality relative to established baseline conditions derived from 
auditory-only and visual-only quality perception evaluations? 


5) Does a high-quality auditory display coupled with a high-quality visual display 
Cause an increase/decrease in the perception of audio quality and/or an increase/decrease 
in the perception of visual quality relative to established baseline conditions derived from 
auditory-only and visual-only quality perception evaluations? | | 


E. LIMITATIONS 


Another facet of this effort was to confine all software development to the ever- 
evolving internet technology. The reasons for this are as follows: 

1) To easily obtain software. All the software used to execute the experiments in 
this effort were simply downloaded. This downloaded software included: Netscape 2.0, 
3.0, and 4.0 [NETS98]; Sun’s Java Development Kit (JDK) 1.0, 1.1.2, 1.1.4, and 1.1.5 
[SUNM98]; Silicon Graphics Inc. (SGI) CosmoPlayer VRML 2.0 beta Netscape Plugin 


and VRML 2.0 Release Netscape Plugin [COSM98}: Sony’s Community Place VRML 
2.0 Browser [SON Y98b]. and Intervista’s WorldView 2.0 Browser [INTE98]. 

2) To reduce cost. All downloaded software was free! 

3) To verify the feasibility of conducting scientific experiments with HTML/Java/ 
JavaScrip/VRML. 

4) To support seamless portability and repeatability of research. The experiments 
outlined in this dissertation are currently being set up to be repeated at the College of 
Computing at Georgia Institute of Technology in Atlanta, Georgia. 

5) To eventually conduct on-line auditory-visual cross-modal experiments which 
potentially have thousands (if not millions) of subjects/trials. 

Another chosen limitation was that of hardware. To complement the ease of 
access and portability of all software, all the hardware used in this effort is available as 
commercial off-the-shelf (COTS) products. As such, no specific, hard to get, or 


intractably expensive piece of hardware is needed for this research effort. 


F. DISSERTATION ORGANIZATION 


This dissertation 1s organized around ten chapters, including a list of references, a 
bibliography, and four appendices. Chapter II discusses relevant background material 
including: Perception, The Senses, Audition, Vision, Attention, Gestalt Theory, 
Synesthesia, and Multimedia. Chapter III presents a thorough literature review covering: 
Virtual Environments (VE), Auditory- Visual Perceptual Organization, Auditory- Visual 
Art Forms and Film, Auditory- Visual Cross-Modal Matching, Visual Dominance Over 
Audition, Auditory-Visual Threshold Perception, and Auditory-Visual Suprathreshold 
Perception. Chapter IV discusses the issues relevant to the overal] development of the 
experimental design process including: Motivation, Design Considerations, Design 
Selections, and Software Design. Chapter V discusses Visual Display Development, 
Auditory Display Development, and Auditory- Visual Display Development. Chapter VI 
gives a complete description of the experimental design of the initial pilot study to 


include: Location, Participants, Apparatus, Procedure, Results and Discussion, and 


Summary and Conclusions. Chapter VII gives a complete description of the experimental 
design involving visual display pixel resolution manipulation of a static radio image, as 
well as auditory display sampling frequency manipulation of a section of music 
including: Location, Participants. Apparatus, Procedure, Changes from Pilot Study, Data 
Collection and Analysis, Results and Discussion, and Summary and Conclusions. 
Chapter VIII gives a complete description of the experimental design involving visual 
display Gaussian white noise level manipulation of a static radio image, as well as 
auditory display Gaussian white noise leve] manipulation of a section of music including: 
Location, Participants, Apparatus, Procedure, Results and Discussion, and Summary and 
Conclusions. Chapter IX gives a complete description of the experimental design 
involving visual display pixel resolution manipulation of a fruit-flower scene, as well as 
auditory display sampling frequency manipulation of a section of music including: 
Location, Participants, Apparatus, Procedure, Results and Discussion, and Summary and 
Conclusions. Chapter X presents the overall findings of this dissertation to include: 
Overall Results, Conclusions, Impact, Observations, Recommendations, Future Work, 
me 


and Final Thoughts. 


Il. BACKGROUND 


A. INTRODUCTION 


The intent of this chapter is to give the computer scientist a high-level overview 
of some of the basic background knowledge which is required in order to understand this 
multi-disciplinary research effort. As such, the information outlined in this chapter is by 
no means comprehensive. Furthermore, the concepts outlined in this chapter lay the 
foundation for understanding the scope of this research effort. Because of the wide 
variety of topics covered including Perception, The Senses, Audition, Vision, Attention 
Theory, Gestalt Theory, Synesthesia, and Multimedia, the reader will hopefully gain a 
better appreciation for the interdisciplinary nature and breadth of knowledge required 


when conducting intersensory research. 


B. PERCEPTION 


1. Definition 


First and foremost it 1s important to remember that “We can only obtain a rather 
one-sided idea of the development of perception if we neglect the interrelations of the 
different senses in creating our perceptual world” [SCHL35]. With this in mind a formal 
‘ definition of perception from a psychological point of view is as follows: 


The psychology of perception, then, involves the study of the way an observer relates 
to his environment -- the way in which information is gathered and interpreted by an 
observer. This relationship is the result of a continuing process of learning, judging, 
interpreting, and reacting to the environment which begins at birth and continues 
throughout the life span of the individual. [MURC73] 


From a physiological perspective, the following describes the nature of a stimulus: 


An excitation originating in any of the receptors does not remain strictly localized, but 
imradiates to some extent throughout the entire nervous system, thus affecting the 
excitatory states of all other mechanisms and consequently the sensory responses for 
which such excitatory states are important predisposing factors. [GILB41] 


2. Stimulus 


A stimulus is defined as ~...any chemical or physical activator which causes a 
response in a receptor” [FOST68]. In total, there are only six classes of stimuli: (1) 
mechanical, (2) thermal, (3) photic, (4) acoustic, (5) chemical, and (6) electrical. 
Furthermore, an effective stimulus is one that produces a sensation, the dimensions of 
which are: quality, intensity, extension, duration, and like and dislike [FOST68]. 

Murch explains that the term stimulus is but half of a pair of correlated terms, the 
other half being response. As such, if we conform strictly to this correlated definition of 
stimulus, a circular definition enfolds. “This concept of stimulus would force us to regard 
the response as dependent on the object or event (stimulus) and the stimulus as dependent 
on the response” [MURC73]. Herman von Helmholtz tried to avoid this circular 
definition by introducing the concepts of distal stimulus (the external object or event) and 
proximal stimulus (the sensory representation of the stimulus by the nervous system) 
[HELM66]. However, Helmholtz’s concepts of distal and proximal] stimulus fall short 
because the arena ley problem remains, “The distal stimulus gives rise to the proximal 
stimulus which in turn contributes to the building of a percept representative of the initial 
distal stimulus” [MURC73]. The distinction between distal and proximal stimuli are 
better explained by using the terms: potential stimulus and effective stimulus [GIBS66] 
[GIBS67]. 


Any object or event in the environment is a potential sumulus. When such a potential 
stimulus stands in a constant relationship with a given response, It 1s an effective stmulus. 
Thus we are able to describe the environment independently of the responses of an 
observer. This is particularly important when we consider that one 1s often unaware of all 
the responses elicited by a stimulus. [MURC73] 


The inherent linkage between sensation and perception’can best be summed up as 
follows: ““To sense is to respond, to perceive is to know” [MURC73]. 

But what happens when we are exposed to multiple stimuli? When two or more 
stimuli occur at the same time and/or space some very interesting perceptual phenomena 
arise. The cause of this phenomena can be explained as follows: “When two qualitatively 


different stimuli are applied to the same locus on the sensory surface very rapidly, rapidly: 


enough so that the two stimuli are perceived as a single event, the perceptual qualities of 


the two [stimuli] merge” [MARK78]. Multiple stimuli response and sensory interaction 


are the crux of this dissertation. Some of the well-known and accepted intersensory 


theories and perspectives are presented in the next section. 


C. THE SENSES 


1. Classification 


The concept of separate sense modalities has been around for a long time having 


its roots date back to the time of Aristotle (circa 384-322 B.C.) [WALK81]. Although we 


typically believe we have only five senses, we really have upwards of 30 or 40 senses 


depending on how the senses are classified. One such classification divides the senses 


into the following modalities: Vision, Audition, Cutaneous Sensitivity, Olfaction, 


Gustation, Kinesthesis, Labyrinthine Sensitivity, and Organic Sensitivity. [FOST68] 


Figure | depicts this classification of the senses along with associated sense organs, 


stimulus, and sensory qualities. 
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Curlical Nerve 
Projectuons 


occipital lobe 
temporal lohe 
parietal lobe 
rhinencephalon 
parietal lobe 
parietal lobe 
none (?), projects 


tothecerebellum 
parietal lobe 


Normal Stimulus 


photic energy 
acoustic energy 
mechanical and 
thermal energy 
yolatile substances 
soluble substances 
mechanical energy 
mechanical forces 


and gravity 
mechanical energy 





Figure 1. Classification of the Senses From [FOST68]. 


Sensory Qualilies 


colors (red, gray) 
tones and noises 


pressure pain, 
heat, cold 

odors (fragrant, 
spicy) 

sweet, Salt, sour, 
bitter 

pressure, pain 


none 


pain, pressure 


In 1940, Ryan [RYAN40] conducted a thorough literature survey on sensory 


interaction. Based on the intersensory research investigated, the following are some of 


Ryan’s findings: 


(1) ...1tis extremely rare outside of the controlled conditions of the laboratory that 
even a single object is the product of operations of a single sensory system. 


(2) Under certain conditions it can be shown that qualities perccived by onc scnsory 
systcm are influcnecd by stimuli reaching other sense organs. 


(3) ...1t 1s evident that sensory systems are part of a unificd organism and by no means 
isolated from one another. [RY AN40] 


Ryan ultimately concludes that the study of the interrelations among the senses Is 
‘...sorely in need of further investigation...” [RYAN4O]. 

In 1941, Gilbert [GILB41] conducted another extensive literature review on 
intersensory facilitation and inhibition. It is interesting to note that Ryan was unaware of 
Gilbert’s work until after Ryan’s work was published, and Gilbert does not mention 
Ryan’s efforts. Nevertheless, Gilbert makes the following conclusions concerning the 


effect of heteromodal (intersensory) stimulation on sensitivity to stimulus intensity: 


(1) Under conditions of momentary heteromodal stimulation (a) a sufficiently intense 
stimulus will momentarily reduce sensitivity in another modality, and increase it after an 
optimum interval (about 1/2 sec.); (b) a less intense heteromodal stimulus will 
momentary increase sensitivity. 


(2) Under conditions of prolonged stimulation, there 1s some evidence that the guality 
of the heteromodal stimulus may determine the direction of the effect, some stimull 
acting as excitants, others as depressants. It is not clear, however, whether there is a 
differential effect among the various modalities. 


(3) The affect will be limited by the liability of the sensation affected, and individual 
differences in their susceptibility to heteromodal influence. [GILB41] 


Upon reviewing all intersensory research (through 1941), Gilbert realized that the current 
‘view on the psychophysical aspect of intersensory interactions 1s lacking. Gilbert’s final 
concluding remarks state that: 


Modern psychophysics has produced overwhelming evidence of the inadequacy of the 
traditional static relationship between stimulus and response, wherein each attribute of a 
sensory response was conceived of as determined simply by the value of a corresponding 
physical dimension of the “adequate” stimulus. Actual experimental evidence... has 
shown that the dimensions of stimulation are inter-dependent in affecting a sensory 
response, and that sensation may be dependent on the interaction of excitations, on 
mental set, physiological state of the organism, practice, and numerous other factors, all 
interrelated in a constant state of flux. [GILB41] . 


In 1947, Sherrington [SHERR47] tries to explain higher-order sensory integration 


as a process in which “...each sense system 1s served by specific receptors that project to 


specific sensory centers in the brain. Intersensory interaction is the concept by which 
multisensory stimuli of the real world (e.g., rhythm) are integrated in the brain” 
(summarized by [WALK81]). 

In 1954, London [LONDS4] presented his findings based on the extensive 
intersensory research conducted in the Soviet Union. Upon the review of numerous 
Intersensory experiments, London concludes that the conditions that influence sensory 
interaction are best summarized as follows: 1) Strength of accessory stimulus, 2) 
Excitatory state of sense organs, 3) Duration of accessory stimulation, 4) Termination of 
accessory stimulation, 5) Affectivity of stimulus, 6) Physiological state, 7) Diurnal 
variation, 8) Summation, repetition oa cumulation of accessory effects [LOND54] 
[STON68]. 

In reviewing London’s research efforts, Stone and Pangborn findings indicate 


that: 


We respond to environmental stimuli through all avenues of sensory input, and, 
although the extent of their interrelationship is not well understood, it is generally 
accepted that the stmulation of one sense organ influences to some degree the sensitivity 
of the organs of another sense. [STON68] 


Stone and Pangborn ultimately conclude that “*...there exists a oreat need for further 
definitive [intersensory] studies. Quantification of individual variability in response to 
dual stimulation does not seem to have been investigated, nor has three-way stimulation 
been reported” [STON68]. 

In 1966, Gibson [GIBS66] [GIBS79] suggests that: 


... perceptual systems cannot be gracefully categorized in terms of specific sensory 
systems, that under natural conditions many senses respond and interact to environmental 
stumulation, and the organism itself is initiating rather than reacting to events. This means 
that intersensory perception and integration are not specialized higher-order complex 
reactions, but are the rule for all perception. (summarized by [WALK81]) 


In other words, it is the particular surrounding environment which determines how our 
senses respond and interact. As a result, sensory interaction must be based on the 


complexity of natural life events and not on simple isolated systems. 


In 1978, a more modern view of sensory interaction is provided by Lawrence 
Marks which is outhined in the excellent book, The Unity of the Senses: Interrelations 
among the Modalities (MARK/78]. From a simple to a more complex perspective. Marks 
describes what he calls the Five Doctrines of sensory correspondence. Briefly, these five 
doctrines are outlined as follows: 


|. Doctrine of Equivalent Information. ...different senses can inform us about the 
same features of the external world. 


2. Doctrine of Analogous Attributes and Qualities. Despite the salience of the 
phenomenal differences among qualities of various sense modalities, there are a few 
properties held in common. 


3. Doctrine that Different Senses have Corresponding Psychophysical Properties. 
...this theory proposes that at least some of the ways the senses behave and operate on 
impinging stimuli are general characteristics of sensory systems, similar from vision to 
hearing, from touch to olfaction. 


4. Doctrine that Similar or Identical Neurophysiological Mechanisms Parallel 
Sensory Correspondence. ...there is a neural analogue to each of the psychological 
doctrines [the first three doctrines]. 


5. Doctrine of the Unity of the Senses. ..incorporates all of the first four theories, and 
in which the several senses are interpreted as modalities of a general, perhaps more 
primitive sensiuvity. [MARK78] 


According to the various intersensory research studied by Marks, he believes that 
the dimension of quality appears to show the fewest similarities from modality to 
modality, but that imfensity displays the strongest cross-modal similarity. BOWeree 
Marks concedes that “The entire area of cross-modality comparisons of sensory quality 
has hardly been explored experimentally” [MARK78]. Furthermore, Marks concludes 
that any sensory interaction is highly stimuli dependent. As Marks explains: 


Perhaps the most crucial factor in determining the significance of any interaction is 
the objective relationship between the stimuli that are used. When stimuli presented to 
different senses bear no meaningful relation to each other, interaction often seems to be 
small or nonexistent. ...But meaningfully related stimuli are quite a different matter. ... 
Meaningful perceptual interactions...occur when concurrent information enters different 
sensory channels.[MARK78] 


An interesting point by Marks which deserves mentioning is that: 


Similarity across the senses must necessarily be one step removed from similarity 
within a sense, for there 1s, by definition, no continuity between modalities. If the senses 
were truly continuous there would only be one sense. [MARK78] 


In 1981, based on her research with blind and normal children, Susanna Millar 
[MILL81] concludes that the sense modalities are neither separate nor unitary. “They 
[modalities] are some of both, complementary to each other, and information can be used 
flexibly from different modalities” [WALK81]. A further conclusion that Millar makes is 
that “...we are slowly beginning to understand the interrelationships of the sense 
modalities. Global generalizations do not seem to hold. No one current theory seems 
capable of encompassing the diversity of findings” [WALK81]. 

In 1981, O’Connor and Hermelin [OCON8 1], having conducted experiments with 
children suffering from either specific perceptual or general cognitive handicaps, describe 
sensory integration through the concept of sensory capture as follows: 


One aspect of sensory integration can be demonstrated by the phenomenon of 
‘sensory capture,” in which conflicting input to different sense modalities is often not 
perceived as such. Instead, the observer seems to resolve such conflict by making one 
sense impression conform with another dominant one. ...Such “capture” of one sensory 
input by another is of interest because it suggests that there may be a degree of perceptual 
equivalence between various sensory information, so that the same stimulus qualities tend 
to be perceived in various modalities. [OCON81] 


3. Neurological Perspective 


Because of recent advances in technology in the field of neurology, there has been 
a surge in intersensory research from a neurological perspective. The reason for this 


much deserved neurological emphasis it that: 


_.there has been comparatively little done to understand the neural phenomena that 
make multisensory integration possible. The paucity of neural data about multisensory 
integration is due in part to different strategies researchers have used to explore the 
functional organization of the nervous system, and also to the inherent difficulties in 
conducting multisensory studies. ...For while the perceptual phenomena demonstrates 
that interactions among different sensory modalities are commonplace and that 
constancies among the modalities must exist in order to use them together effectively, 
there is no comparable body of literature describing the neural mechanisms that underlie 
them. Nevertheless, there is a good deal of information about the location in the brain 
where inputs from different modalities converge. [STEI93] 


One place in the brain where visual, auditory, and somatosensory inputs converge 1s in 
the superior colliculus as depicted in Figure 2. Furthermore, in looking at the horizontal 
and vertical meridians of the different sensory representations in the superior colliculus, 


one can see that they are very similar in terms of acommon coordinate system. Stein and 
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Figure 2. The Superior Colliculus From [HARV98]. 





Meredith conclude that this common coordinate system suggests a representation of 
Multisensory Space (see Figure 3). By examining the neurological responses of superior 
colliculus in various animals, primarily the cat, Stein and Meredith have found 
considerable evidence supporting the principles of multisensory convergence and 
interaction based on single neuron evoked potentials as depicted in Figure 4. Stein and 
Meredith believe that neurological studies in other animals are very important and lead to 
a better understanding of human perception. Thus, based primarily on the neurological 
studies of other animals, primarily cats, Stein and Meredith outline the rules in terms of 
space and time governing multisensory integration as based on unimodal receptive field 
characteristics as follows: 


Space: spatially coincident multisensory stimuli tend to produce response 
enhancement, whereas spatially disparate stimuli produce either depression or no 
interaction. 
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Multisensory 


Visual 


Somatosensory 


Figure 3. Common Coordinate System in the Superior Colliculus Suggesting 
Multisensory Space From [STEI93]. 





Figure 4. Convergence of Inputs from the Different Senses on 
a Single Neuron From [STEI93]. 


Time: maximal multisensory interactions are not dependent’on matching the onset of 
two different sensory stimuli, or their latencies, but on how the activity patterns resulting 
from the two inputs overlap. 


[Overall]...the spatial register among the receptive fields of multisensory neurons and 
their temporal response properties provide a neural substrate for enhancing responses to 


stimuli that covary in space and time and for degrading responses that are not spatially 
and temporally related. [STEI93] 


Although they found considerable evidence supporting a neurological basis for sensory 
integration, Stem and Meredith conclude that: “an enormous number of challenges must 
be met before we understand more fully the process involved in integrating information 


from different sensory modalities” as seen in Figure 5. 





Figure 5. Neurons Synthesize Information from Different 
Sensory Modalities From [STEI93]. 


D. AUDITION 


1. Definition 
Before audition can be defined, we need to have an understanding of what 1s 


meant by sound. The following gives a formal definition of sound: 


Sound is the perception by humans of vibrations in some physical medium, usually 
air. These physical vibrations of the air are evidenced by alternating rarefractions and 
compressions. Man’s primary sense organ for the sound stimulus is the ear. [SILB68] 
(see Figure 6) 


The formal definition of hearing (the sense of audition) from a physiological perspective 
1s as follows: 


Hearing is the response of an animal to sound vibrations by means of a special organ 
for which such vibrations are the most effective stimulus. The critical phrase here is 
“most effective,” which means that this special organ (which we shall call an ear) is more 
sensitive to sound than it is to any other form of energy. All other mechanoreceptors 
respond to acoustic vibrations if these vibrations are strong enough and sufficiently low 
in frequency, but they do so crudely, requiring large amounts of energy in comparison 
with what they require in the stimuli that are most appropriate to them and in relation to 
what the ear requires within its proper frequency range. Organs in the skin (tactual and 
deep pressure endings) in muscles, tendons, and joints (kinesthetic endings), In the 
vestibular labyrinth (gravity and motion receptors), and even pain organs throughout the 
body can all be excited by sounds of sufficient strength. But none of these organs 
approaches the ear in delicacy and in the effectiveness of utilization of sounds as a means 
of gaining information about the outside world. [WEVE74] 
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Figure 6. The Ear From [MURC73]. 


In other words, although the entire human body is capable of hearing sounds, the ear 1s 
the most sensitive to sound which in turn makes it the primary mechanism for hearing 


¢ 


sounds. 


2. Subjective Evaluation 
Given that we can hear sounds, how do we rate the guality of sound? What ts of 
good quality to one person may be of bad quality to another. As a result, rating the 


quality of sound is a subjective task based largely on the rendering capability of the 
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equipment that is generating the task. Another aspect to the quality of sound is that of 
content. For example. some may like to listen to rock-and-roll where intentional 
distortion is often reproduced as high quality; whereas, others may think the musical 
quality of rock-and-roll 1s poor. Content 1s an important consideration when conducting 
sound quality tests of loudspeakers or headphones, and studies have shown that when 
conducting sound quality experiments “...the problem of selecting test material was 
evident. Relevant test material has not yet been defined. Different recording techniques 
influence the assessment of the sound quality” [THEI86]. Although content is important, 
this research effort focuses on the perception of the physical characteristics of the sound. 
But what physical characteristics, dimensions, attributes, etc., of sound are applicable to 
rate? 

Zwicker and Zwicker [ZWIC91] propose that: 


The information received by our auditory system can be described most effectively in 
the three dimensions of specific loudness, critical-band rate, and time. The resulting 
three-dimensional pattern is the measure from which the assessment of sound quality can 
be achieved. [ZWIC91} 


In experiments conducted to identify perceived sound quality of loudspeakers, 
Gabrielsson and Lindstrém had subjects rate music on a category scale from 0-10 using 
the following dimensions: “Clarity, Fullness, Spaciousness, Brightness, Softness, 


Absence of Extraneous Sounds, and Fidelity.” [GABR§85] as depicted in Figure 7. 
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Figure 7. Sound Quality Rating Scale From [GABR85]. 


Based on Gabrielsson and Lindstr6m’s efforts, Toole [TOOL85] expanded the 
dimensions on which to rate sound quality to include a specific rating format for spatial 


quality as depicted in Figure 8. 
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Figure 8. Spatial Quality Rating Scale From [TOOL85]. 


In evaluating the quality of loudspeakers using an impulsive tone-burst signal, 
Furmann et al. [FURM90] had subjects rate the following attributes on a scale of 0-10: 


1) Sharpness -- The sound contains components whose mid-and high-frequency levels 
are too high. 

2) Pureness -- The sound is not distorted, devoid of sounds not appearing in the 
signal, readable in the entire frequency range. 

3) Equalness -- The sound retains the proportion of tones; it is linear without 
expansion of tones. 

4) Clearness -- The sound is pure and clear; different instruments and voices can be 
distinguished easily; onsets and transients in the music can be perceived easily. 

5) Feeling of Space -- The reproduction is spacious; the sound is open, has width and 


depth, fills the room, gives the impression of the subjects presence in the space 
surrounded by sound. [FURM90] 


In measuring subjective and objective acoustical measurements, Burkhard and 
Genuit [BURK92] recognize that any acoustical measurement system should yield 
information that relates to how humans hear. As such, Burkhard and Genuit identify the 


relevant parameters that are involved during the classification of a sound event by a 


human listener as seen in Figure 9. 
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Figure 9. Parameters Relevant to Evaluation of Sound by Human Listeners 
From [BURK92}. 


In terms of spatial hearing, Blauert [BLAU97], identifies proven and 
hypothesized psychophysical theories corresponding to positional auditory events. These 
events are categorized as follows: Basic vs. Supplemental, Homosensory vs. 
Heterosensory, and Fixed-position vs. Motional. The physical processes and phenomena 
which make use of these psychophysical theories are outlined in Figure 10. For more 
insights in how humans perceive the quality of sound, see the following: [BECH90] 


[TOOL90] [VIEM90] [BURK92] [THUR92]. 
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Categories: Basic (B) vs. Supplemental (S); Homosensory (Ho) vs. Heterosensory (He); 
Fixed-position (F) vs. Motional (M). 





Figure 10. Psychophysical Theories of Spatial Hearing From [BLAU97]. 
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E. VISION 


1. Definition 


A formal definition of vision is as follows. 


Optic nerve 
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Figure 11. The Eye From [MURC73]. 


Vision 1s a complex phenomenon consisting of several basic components. Sight from 
external sources is brought to a focus on the retina of the eye. Changes are produced 
which initiate electrical impulses. These are conducted over the optic nerve and optic 
tract to the brain where the visual sensation Is perceived and interpreted. [MCNA68] (see 
Fiosure }1) 


2. Subjective Evaluation 

An approved method for the subjective evaluation of visual displays can be found 
in the Method for the Subjective Assessment of the Quality of Television Pictures 
published by the Geneva International Telecommunications Union [GENE86]. This 
publication recommends using a five-point rating scale for evaluating quality. The five 
points on the rating scale are as follows: | Bad, 2 Poor, 3 Fair, 4 Good, and 5 Excellent. 
Also, the use of non-expert observers 1s recommended, and the number of observers 
should be at least ten and preferably twenty. Also, the publication recommends that an 
experimental testing session should not last more than roughly 30 minutes, and that a 
duration of 10 seconds for visual stimuli is sufficient for still or moving sequences. 


Furthermore, the publication suggests that visual stimuli may be based on a randomized- 


block design derived from Greco-Latin squares. (See [GOOD95] for an example of the 
Latin squares technique. ) 

After an exhaustive literature review, Padmos and Milders [PADM92] present a 
long list of quality criteria for simulator images. This list includes criteria based on: 
Visually Perceiving the Environment, Physical Image Properties, lnage Capacity, 
Appearance of Surfaces, Visibility and Light Effects, and other miscellaneous features. 
The target simulator for this quality criteria 1s that of the vehicle simulator, but the criteria 


apply equally well to virtually any type of simulator image. 


3. Visual Dominance 

The current view of visual dominance can be attributed to the work of Posner et 
al. (see [POSN76]). Posner’s efforts tried to identify why the visual modality tends to 
“dominate conscious judgements about the presence and location of objects” [POSN76]. 


Posner’s general theory of visual dominance includes the following four propositions: 


Proposition |. Visual stimuli are not as automatically alerting as stimuli in other 
modalities. 


Proposition 2. In order for a visual event to serve as an effective alerting stimulus, the 
subject must first process it by active attention. 


Proposition 3. The consequence of active attention toward any one modality is a . 
reduction in the availability of the attentive mechanisms to input from other modalities. 


Proposition 4. To compensate for the low alerting capability of visual signals, subjects 
exhibit a general attentional bias toward the visual modality whenever they are likely to 
receive reliable input from that modality. This bias may not be obvious to them, but it can 
be viewed as a Strategy of a very pervasive sort. [POSN76] 


F. ATTENTION 


“The essence of the concept of menion is the focusing of awareness” 
[DEMB79]. Our span of attention 1s derived from our span of perception. Perception 
spans the range from subliminal stimuli (unconscious awareness) to liminal stimuli 
(conscious awareness) as depicted in Figure 12. Using the common searchlight metaphor 
as depicted in Figure 12, the three main aspects of attention in perception are as follows: 


1) Selective Attention: corresponds to the direction of the search light; 2) Focused 
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Figure 12. The Span of Attention and the 
Span of Perception From [DEMB79]. 


Attention: corresponds to the immediate center of the beam of light illuminated by the 
searchlight: and 3) Divided Attention: corresponds to both the immediate center of the 
beam of light and the fringe just outside the beam of light. Overall, attention plays a 
pivotal role in human information processing, one that not only selects information 
sources to process but also acts as a commodity or resource of ]imited availability 


[WICK92] (see Figure 13). 


1. Selective Attention 


As the searchlight metaphor explains, selective attention directs the searchlight. 
Thus, selective attention 1s concerned with the process of how, when, what, and where we 
actually focus on (or attend to) various and numerous stimuli. The selection process acts 
as sort of a filter between sensory processing and attention as depicted in Figure 14. 
Numerous theories over the years have tried to describe the nature of this selection 


process. One of the more popular theories is Broadbent’s Filter Theory [BROAS8]. 


a. Broadbent’s Filter Theory 


Broadbent proposed that the brain contains a selective filter which chooses messages 
on the basis of physical characteristics toward which it is “tuned” and rejects others. The 
filter spares the limited-capacity system from being overloaded; complex forms of input 
are rejected on the basis of simple qualities, and a higher-level analysis of them need not 
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Figure 13. A Model of Human Information Processing From [WICK92]. 





Figure 14. Selective Attention From [MURC73]. 
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occur. ...In essence, the filter model views the selective nature of attention as resulting 
from restrictions in the capacity of the nervous system to process information. 
...Preference 1s shown for novel or intense events, acoustic over visual signals, sounds of 
high frequency, and signals of biological importance to the organism. [DEMB79] (see 
Figure 15) 
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Figure 15. Information-Flow in Broadbent’s Filter Theory From [DEMB79]. 


b. Filter Attenuation Theory 


Although the Filter Theory seemed adequate, a number of studies, 
primarily conducted by Anne Treisman [TREI69] [TREI73], soon identified certain 
limitations. As a result, a modification was made to the Filter Theory resulting in the 


Filter Attenuation Theory. 


The essence of this modifjcation is that filtering 1s not an all-or-none affair. Treisman 
suggested that the filter does not cut off rejected messages entirely, but instead attenuates 
their strength. Thus, under some conditions, the weakened signals can still contact 
higher-level elements of the perceptual system. [DEMB79] (see Figure 16) 


c. Response-Selection Theory 

An entirely different perspective of selection attention was formalized by 
Deutch and Deutch [DEUT63]. This theory, called the Response-Selection Theory, 
maintains “...that al/ mental inputs are fully analyzed perceptually and that selection takes 


place only when the observer responds to stimuli” [DEMB79]. 
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Figure 16. Information Flow in Treisman’s 
Filter Theory From [DEMB79]}. 


d. Hybrid Theory 


Recognizing the debate over the various theories of selective attention 
(which continues still today), Dember [DEMB79] suggests another possible solution as 


follows: 


It is conceivable that our cognitive capacities are more flexible than we have been 
willing to assume, and that both perceptual and response selection can take place under 
appropriate circumstances. ... This new breed of attentional theory may very well prove of 
conceivable value in directing research toward a more satisfactory solution to the mystery 
of selection attention. [DEMB79] 


2. Divided Attention 

Whereas selective attention deals with our ability to direct our focus among 
stimuli, divided attention deals with our ability to divide our attention among stimuli or 
tasks. Divided attention occurs when “the task is to attend to several simultaneously 


active input channels or messages, responding to each as needed” [BOFF86]. Early 


researchers believed that it was impossible to attend to several simultaneous stimuli -- 
that attention was indivisible. Nowadays, divided attention is readily believed, but how 
we divide our attention has raised considerable debate. The issue is whether or not we 
process simultaneous inputs 1n parallel or in serial. However, the conclusions drawn from 
considerable research suggest that *...both modes of processing occur, depending on the 
task and on the circumstances,” [KAHN73] and whether or not the stimuli are intramodal 
or intermodal. Our ability to divide our attention among various stimuli directly 


corresponds to our limited ability to time-share among these various stimull. 


3. Time-Sharing 


Our ability to time-share depends on how efficient we schedule and switch 
between various stimuli. For example. if we are given plenty of time to complete two 
separate tasks, we will probably complete one task then switch to completing the other 
task. However, if the amount of time we are given is drastically reduced, we might have 
to engage in completing both tasks concurrently. Processing tasks concurrently leads to 
three further factors which will influence our ability to successfully complete concurrent 
processing. These factors are: confusion of the task, cooperation between task processes, 


and competition for task resources. [WICK92] 


Confusion results when elements for one task become confused with the processing of 
another task because of their similarity. 


Cooperation occurs when there is a high similarity of processing routines between 
tasks which can result in the possible integration of the two task elements into one. 


Competition, the critical element of concurrent task time-sharing, relates to the level 
of difficulty between the tasks -- the greater the difficulty, the greater the competition. 
[WICK92] 


When we say that difficult tasks (stimuli) are in competition with one another, this 
competition refers to competing for the limited amount of total available resources 
needed to complete the tasks. With this in mind, there are two theories on how resources 


are allocated to attention: 1) Single-Resource Theory, and 2) Multiple-Resource Theory. 
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Figure 17. Single Resource Theory From [WICK92]. 


a. Single-Resource Theory 


The Single-Resource Theory (see [KAHN73]) argues that we have one 
single supply of undifferentiated resources available to all tasks and mental activities. 
‘As task demands increase either by making a given task more difficult or by imposing 
additional tasks, physiological arousal mechanisms produce an increase in the supply of 
resources” [WICK92]. The Single-Resource Theory is depicted in Figure 17. The main 
limitation of this theory is that it compares task difficulty within the same dimensional 


constraints. As such, it does not consider the structure of the task as it relates to the 
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Figure 18. Multiple Resource Theory From [WICK92]. 


processing of the task such as its Codes, Modalities, and Stages. [WICK92] Correcting 


this limitation provides the impetus for the Multiple-Resource Theory. 


b. Multiple-Resource Theory 
The Multiple-Resource Theory stipulates that tasks are processed based on 
multi-dimensional constraints. These constraints involve the task’s Codes (Spatial vs. 

. Verbal), Modalities (Auditory vs. Visual), and Stages (Encoding, Central Processing, and 
Responding) as depicted in Figure 18. As such. “...people have several different 
capacities with resource properties. Tasks will interfere more and difficulty-performance 
trade-off’s will be more likely to occur, if more resources are shared.” [WICK92] For 
example, two visually dominating tasks may compete for the same resources resulting in 
greater interference (competition) of the two tasks. But. if one task is visually dominating 
and one task is aurally dominating, they may not have to compete with each other, for 


they utilize separate resources as depicted in Figure 18 as opposed to common resources 


as depicted in Figure 17. 


al 


4. Sustained Attention 


Sustained attention deals with our ability to maintain focused attention over 
prolonged time periods. Sustained attention is commonly referred to as vigilance. During 
the early Cold War years (1950s through 1980s), there was an increased threat of global 
thermonuclear war. As such, radar operators monitored their radar scopes for potential 
incoming missiles for prolonged periods of time (vigilance). Because of the severe 
repercussions that could result if a radar and/or sonar operator missed a bleep on the 
scope, the study of vigilance became very popular (on both sides of the cold war). The 
results of these studies provided new insights into such theories as: Vigilance, Signal 
Detection, Expectancy, Arousal, and Habituation. The concept of sustained attention does 
not play a role in this dissertation. It is being presented to complete the discussion of 
attention and to clarify the issues of attention that are relevant to this research effort. 
During the preliminary literature review of this dissertation, much time was spent 
reviewing auditory-visual vigilance.studies. For a listing of pertinent auditory-visual 
cross-modal] signal detection and vigilance research, see APPENDIX B. AUDITORY- 
VISUAL CROSS-MODAL SIGNAL DETECTION AND VIGILANCE 
BIBLIOGRAPHY. 


5. Cognitive Ecology Perspective 


Ecology is the study of the interaction of living creatures with their environment. 
For ecological psychology, the focus is the relation of mind to environment. Cognitive 
Ecology is a new field “... a deep ecology of the mind, in which mind and environment 
are treated not as separate objects or topics but as codefining poles of experiences and 
actions” [FRIE96]. In the book, Cognitive Ecology [FRIE96], two qualitatively different 
aspects of attention are described as having: (1) a clear nucleus of focus of attention, and 
(2) a fringe to that experience. The focus of attention refers to the typical searchlight 


metaphor of attention. The fringe refers to: 


... many types of experience, such as: (1) feelings of familiarity, (2) feelings of 
knowing, such as tip-of-the-tongue-experiences, (3) feelings of relation between objects 
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or ideas, (4) feelings of action tendency, as in intentions, (5) feelings of expectancy, (6) 
feelings of rightness or being on the nght track. ...(7) metaknowledge of one’s memory or 
one’s abilities... [and] (8) Perhaps the most pervasive fringe feeling is that of 
meaningfulness, that one knows the larger context of any given moment of focal attention 
although that context 1s not part of the content of attention. [FRIE96] 


There are three issues in which this fringe experience are relative to cognitive ecology: 1) 
the issue of knowledge of content, 2) the issue of capacity, and 3) the issue of agency. 
The second issue, that of capacity, identifies potential shortcomings of the tradition view 
of attention. Specifically: 


Attention is normally viewed either explicitly, or more recently implicitly, as a 
limited-capacity system. ...This may be because only focal attention is normally 
investigated. A mind that is defined literally as part of its environment (the subjective 
pole of attention in a subject-object field) should have much broader attentional 
capacities than a mind defined as separate. Many of the anomalies of attention and 
consciousness research, such a blind sight and the other agnosias, are cases that violate 
the standard limited-capacity conception. Investigation of fringe phenomena may serve to 
expand, or perhaps undermine, models of attentional limits. [FRIE96] 


G. GESTALT THEORY 


Gestalt Theory was founded by German Psychologists Max Wertheimer 
[WERT 12], Kurt Koffka [KOFF35], and Wolfgan Kohler [KOHL40]. The basic idea of 
Gestalt Theory is that we perceive things wholistically as opposed to its parts. “Certainly 
to process information as wholistic or gestalt stimuli rather than as separate elements 1s 
an efficient thing for the organism to do -- and possibly that is the advantage of gestalt 
patterns” [GARN70]. As a result, to view things as whole, rather than as parts, we 
perceptually organize things, objects, etc. into groups. The Gestalt Factors of Perceptual 


Organization include the following: 


1) Factor of Similarity, 2) Factor of Proximity, 3) Factor of Common Fate, 4) Factor 
of Objective Set, 5) Factor of Inclusiveness, 6) Factor of Good Continuation, 7) Factor of 
Closure, 8) Factor of Fixation, 9) Factor of Contour, and 10) Factor of Object 
Interdependence. [MURC73] 


Gestalt Theory was developed primarily to explain how we perceptually group visual 


objects, but its concepts can also be applied to the other senses. 


H. SYNESTHESIA 


One of today’s leading experts in the study of synesthesia is Richard Cytowic. He 
defines synesthesia as 


...an involuntary joining in which the real informatron of one sense is accompanied by 
a perception in another sense. In addition to being involuntary, this additional perception 
is regarded by the synesthete as real, often outside the body, instead of imagined in the 
mind’s eye. [CY TO89] 


It is estimated that synesthesia occurs in about one in 25,000 individuals [CYTO95], so 
its occurrence Is fairly rare. One of the most common forms of synesthesia is that of 
colored hearing. A synesthete experiences colored hearing when certain sounds (physical 
stimuli) evoke perceptions of various colors. For example, when listening to certain 
classical music, a synesthete might experience shades of blue and/or green. Colored 
hearing is the most common form of synesthesia. Another more bizarre example is that of 
gustatory-tactile synesthesia. In this case, the synesthete experiences (perceives) certain 
shapes based on various tastes (physical stimuli) (see Figure 19) In fact, because of the 
bizarre nature of this condition, Cytowic wrote an entire book based on the research of a 
man with gustatory-tactile synesthesia. See [CYTO93] for an in-depth review of 
gustatory-tactile synesthesia. 

The concept of synesthesia dates back over two hundred years. For an exhaustive 
survey of all classic and contemporary synesthesia literature dating back over this 
‘interval, see [BARO96]. The validity of synesthesia, though, has suffered over the years 
for it is introspective in nature. However, Cytowic has helped to validate synesthesia by 
examining the neural substrates of synesthesia as outlined in [CYTO89] [CYTO93]. The 
results of Cytowic’s research indicate that: 


The synesthetic experience may be a result of a fundamentally mammalian process in 
which the cortex briefly ceases to function in the modern manner, permitting the senses 
to fuse, or, rather, we should say, percetve fusion that may be there all along but that 
never arises to consciousness. At its essence, synesthesia may be a remnant of how early 
mammals perceived their world. ...Synesthesia is what we all do without knowing that we 
do it, whereas synesthetes do it and know that they do it. [CYTO89} 
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Figure 19. Tasting Shapes From [CYTO89]. 


I. MULTIMEDIA 


‘According to a recent projection, multimedia and creative technologies will 
represent a new market of $40 billion by the year 2000 and $65 billion by the year 2010” 


[GUPT97]. As such, there 1s indeed a market emphasis on multimedia and there are still 


many unanswered questions. To support the continued growth of multimedia, it must 
expand and develop in parallel with internet technology, not as an afterthought or as an 


add-on. As such, 


... the central integrated media-systems-related issue that must be addressed during 
the next decade is storage, indexing, structuring, manipulating, and “discovery” of 
integrated multimedia information units (MIUs) that include structured data values 
(strings and numbers), text, images, audio, and video. The key research focus in this area 
centers On managing multimedia information units in the context of a highly distributed 
and interconnected network of information collections and repositories. Current data and 
knowledge management technology that addressees collections of formatted data and text 


1s inadcquatc to meet the needs of vidco and audio information, as wcll as the mixturc of 


modalitics in MIUs. [GUND97] 
In [BLAT96], Blatter and Glinert express the need for a greater understanding and need 
for multimodal integration. They correctly recognize that “Although we have seen much 
progress in recent years in the use of single modalities, the general problem of designing 
integrated multimodal systems 1s not well understood” [BLAT96]. One of the reasons for 
the current lack of integrated multimodal systems 1s that the system designers, 1.e. 
computer scientists, are not knowledgeable with the issues associated with multimodal 


concepts. Thus, 


...the (computer) scientists who design thc new interfaces and human-computer 
communications devices must address issues whose solutions lie outside of their 
discipline. Integrating modalities requires understanding how people use their various 
senses to perceive and interact with the world around them. Despite more than 100 years 
of research into these issues, much remains unknown. [BLAT96] 


As a result, “Research by non-computer scientists shows that computer scientists have 
sometimes failed to appreciate the distinction between human and computer modalities” 
[BLAT96]. This explains why it is typical to judge a simulation or virtual environment by 
the auditory and visual technical rendering capabilities of the system (computer and 
displays), as opposed to how well stimulated are the auditory and visual sensory 
modalities of the immersed participant, 1.e. an engaged human. 

Brenda Laure] [LAUR93], provides numerous insights into the use of multimedia 
and human-computer interaction. She states that “Multiple modalities are desirable only 
insofar as they are appropriate to the action being represented” [LAUR93]. With an 
artistic background, Laurel brings a much-needed dimension to field of multimedia. With 
her creative experience, she correctly recognizes that an artistic touch can lead to better 
(smarter) multimodal integration in multimedia systems. Accordingly, Laure] states: 


But we mustn’t fall prey to the notion that more 1s always better, or that our task is the 
seemingly impossible one of emulating the sensory and experimental bandwidth of the 
real world. Artistic selectivity is the countervailing force -- capturing what is essential in 
the most effective and economic way. A good line-drawn animation can sometimes do a 
better job of capturing the movements of a cat than a motion picture, and no photograph 
will ever capture the essence of light in quite the same way as the paintings of Monet. 
The point is that first-person sensory and cognitive elements are essential to human- 


computer activity. There is a huge difference between an elegant, selective multi-sensory 
representation and a representation that squashes sensory vanety into a dense but 
monolithic glob of text. [LAUR93] 


Thus, we must not assume that we always need the best possible graphics and audio. The 
particular application, overall sensory perception, and creative use of stimuli ought to 


drive fidelity requirements. 


qi. SUMMARY 


In summary, this chapter has provided the computer scientist with a high-level 
overview of Perception, The Senses, Audition, Vision, Attention Theory, Gestalt Theory, 


Synesthesia, and Multimedia. 
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It. LITERATURE REVIEW 


A. INTRODUCTION 


This chapter presents a literature review on relevant auditory-visual cross-modal 
perception phenomena. Whereas the background provided in the previous chapter 
presents a general overview of the concepts underlying the psychological and 
physiological nature of auditory and visual perception, this chapter specifically focuses 
on VEs and auditory-visual intersensory phenomena. Using the background provided in 
the previous chapter, the reader can better understand the theoretical basis and overall] 


findings of the numerous auditory-visual research endeavors outlined in this chapter. 


B. VIRTUAL ENVIRONMENTS 


1. Definition 


The National Research Council’s (NRC) Committee on Virtual Reality Research 
and Development defines VE systems with the following explanation: 


Virtual environment systems differ from other previously developed computer- 
centered systems in the extent to which real-time interaction 1s facilitated, the perceived 
visual space is three-dimensional rather than two-dimensional, the human-machine 
interface 1s multimodal, and the operator is immersed in the computer-generated 
environment. [DURL95] 


But what does virtual mean? Ellis [ELLI96] tries to clarify the term virtual by 
introducing the concept of virtualization which is the **...process by which a viewer 
interprets patterned Sensory impressions to represent objects in an environment other than 
that from which the impressions physically originate” [ELLI96]. Ellis continues to 
explain that virtualization applies primarily to vision and audition and that there are three 
levels of virtualization: Virtual Space, Virtual Image, and Virtual Environment as 
depicted in Figure 20. Furthermore, because of the diverse nature of VEs, the NRC 
Committee explains that the development of a VE requires “...a crucial need for 


cooperation among many disciplines, including computer science, electrical and 
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Figure 20. Levels of Virtualization From [ELLI96]. 


mechanical engineering, sensorimotor psychophysics, cognitive psychology, and human 
factors” [DURL95]. Cross-disciplinary transfer of knowledge 1s typically lacking, 
causing a potential degradation of VE development. This dissertation attempts to better 
facilitate cross-disciplinary transfer of knowledge and to hopefully improve VE 


development with respect to auditory-visual cross-modal perception considerations. 


2. Multimodal Concerns 


**...the development of multimodal synthetic environments is an extremely 
important and challenging endeavor. [It]...requires that we carefully examine our current 
assumptions concerning VE architectural requirements and design constraints” 
[DURL9S5]. One of the first multimodal networked VEs was that of Networked SPIDAR 
[{SH194}. In this networked VE, participants collaborated on the design of 3D objects 
using visual, audio, and haptic information. The developers of Networked SPIDAR 


believed that “A networked virtual environment must support these interactions [visual, 
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Figure 21. Multimodal Modes in Virtual Environments From [GUPT97]. 





audio, and haptic] without contradiction in either time or space” [ISHI94]. Gupta et al. 
[GUPT97] also describes experiments using multimodal environments to enhance 
computer-aided design (CAD). They describe the relationship of the inserted human 
participant to auditory, visual, and haptic feedback devices as depicted in Figure 21. 
However, the majority of research and development in VEs has typically focused on the 
sense of vision (1.e., the visual channel). Accordingly: 


To date much of the design emphasis in VE systems has been dictated by the 
constraints imposed by generating the visual scene. The nonvisual modalities have been 
relegated to special-purpose peripheral devices. ... However, many of the issues involved |, 
in the modeling and generation of acoustic and haptic images are similar to the visual 
domain; the implementation requirements for interacting, navigating, and communicating 
in a virtual world are common to all modalities. Such multimodal issues will no doubt 
tend to be merged into a more unitary computational system as the technology advances 
over time. [DURL95] 


Thus, proper VE development must focus on all modalities equally. This focus on the 
modalities need not only concentrate on the intra-relationships but also on the inter- 
relationships. As the NRC Committee explains: “Detailed study of both intrasensory and 
intersensory illusions is important because, in many cases. the existence of illusions 


enables SE [synthetic environment] systems design to be simplified and therefore to 


4] 


increase its cost-effectiveness” [DURL95]. Furthermore, under the category of 
Psychological Considerations the NRC Committee recommends further study in 
*channel-interaction effects that occur with multimodal interfaces.”” Some notable 
channel-interaction (intersensory) effects: 


...Include those on the dominance of vision over audition and haptics in cases of 
intermodality conflict (e.g., as evidenced in the ventriloquist effect) and on the use of 
auditory stimuli to improve the perception of events that are represented primarily in the 
visual or haptic domains (as in the use of sound effects) [DURL95}. 


It seems fairly obvious by this point that proper development of VEs must 
consider multimodal factors. Since we currently have the technology to render very high 
quality auditory and visual displays, the proper use of this technology must not neglect 
potential auditory and visual cross-modal perception phenomena. Brenda Laurel makes 
the point that auditory and visual cross-modal issues have always been a consideration in 
the art world. Now with the recent surge in the development of VE technology, the same 
cross-modal considerations of the Arts apply to VEs. Brenda Laurel states: 


VR has reinvigorated and recontextualized the study of human sensation and 
perception. While much is known about the human visual or auditory or tactile senses, 
relatively little is known “scientifically” about how these senses combine. Still less is 
known about how they combine in the context of representations, as opposed to the 
context of the actual world. For example, it is well known in the folklore of computer 
game design that high-quality audio makes people perceive visual displays to have higher 
resolution. It is also well-known that the converse is not true: Great graphics will not turn 
a PC’s beeps and boops into Beethoven.The study of sensory combinatorics, that 1s, how 
vision affects audition or how the two in concert affect emouon, was almost exclusively 
the province of the arts unt VR came on the scene. [LAUR93] 


3. Fidelity Requirement 

What are the fidelity requirements of a VE? First and foremost (and sometimes 
neglected), the intended outcomes of the particular application ought to drive the fidelity 
requirements. For example, the visual fidelity of a VE intended to train surgeons in open- 
heart surgery probably needs to be greater than the visual fidelity of a VE intended to 
teach children how to read. Another consideration is that of the human sensory system: 
the fidelity requirements of VEs need not exceed that of the human perceptual system. As 


such, “Knowledge of normal human resolving power On the input side, 1.e., the sensory 
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side, allows one to predict the display resolution beyond which finer resolution cannot be 
perceived and would therefore be wasted” [DURL95]. For example, the auditory fidelity 
of many VEs, in terms of frequency range, need not exceed that of the nominal range of 
human hearing (1.e., 20 Hz - 20 kHz). A caveat pertains here: some research indicates that 
our perceptual frequency range is much greater (see [OOHA91] [BOYK97)). 
Nevertheless, the capabilities of the human sensory system ought to drive the fidelity 


requirements of VEs as depicted. in Figure 22. 
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Figure 22. Computer Technology Organization for Virtual Reality 
From [DURL9§5]. 


Details regarding humans’ ability to detect and discriminate visual, auditory, 
tactile, and kinesthetic information along with corresponding technical specifications of 
VE equipment is presented in the excellent paper by Barfield et al. [BARF95]. Barfield 
states that “It is important to have a thorough understanding of the capabilities of the 
human’s sensory systems and to use this knowledge in the design of virtual worlds and in 


deriving technical specifications for virtual environment equipment” [BARF95]. 


43 


When Barfield compares the human sensory system with technical specifications 
of VEs, he considers the modalities as separate entities. However, the VE participant, 
being human, is multimodal by nature. As a result, one very key consideration neglected 
in Barfield’s paper is how the senses interact, and another is how this sensory interaction 
may or may not conflict with how the singular modality capabilities derive the 
specifications of VEs. The NRC Committee also recognizes that visual fidelity 
requirements are influenced by other modalities and that a greater understanding 1s 
needed in multimodal integration in hopes of answering the following unanswered 
questions: 


How are the required visual display system parameters affected within multimodal 
systems? Can visual display system requirements be relaxed in multimodal display 
environments? What are the perceptual effects associated with the merging of displays 
from different display sources? [DURL95] 


One factor in considering auditory and visual fidelity requirements is that of display 
resolution. In a VE, the auditory and visual resolutions ought to be properly matched. As 
Brenda Laurel correctly States: 


... we also Sometimes expect certain kinds of patterns to occur. Although, there are 
many reasons for emphasizing one modality over another, we tend to expect that the 
modalities involved in a representation will have roughly the same “resolution.” A 
simplistic cartoon-style animation with naturalistic character voices and environment 
sounds, for instance, seems out of whack. A computer game that incorporates 
breathtakingly high-resolution, high-speed animation but only produces little beeps seems 
brain-damaged. [LAUR93] 


On analyzing the use of performed sound and music in VEs, Pressing [PRES97] . 
classified sound into three categories: 1) artistic expression, 2) information transfer, and 
3) environmental sounds. Pressing concluded that: “Across all three categories the need 
for further research on the psychological] aspects of sound and performance in virtual 
environments was apparent” [PRES97]. Another fidelity consideration is that “...cartoons 
and caricatures, despite their drastic loss of information and fidelity, may better serve to 
represent the world, clarify visual relationships...and effect our thoughts...than pictures of 
high fidelity” [FRIE96]. Similarly, on integrating sounds and motions in VEs, “Sounds 


tend to affect the listener in a more subconscious and impressionistic way than visual 
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cues’ [HAHN98]. Furthermore. when considering the fidelity requirement of VEs, there 
are many perspectives from which to view fidelity. perhaps all of which are correct! 
Flach and Holden [FLAC98] outline the following definitions of fidelity from various 
scientific perspectives. 


1) Newton's Way: Fidelity is derived from three-dimensional space and time (e.g., 
chronometric analysis). 


2) Einstein’s Way: Since space and time are relative to a certain frame of reference, 
they cannot be scientifically committed to any sense of realism; therefore, space and time 
cannot be used as a measure of fidelity. 


3) Fechner’s Way: Fidelity 1s defined in relation to the correspondence between the 
simulated world and the “real” world as measured using the ruler and clock of classical 
physics. 


4) Helmholtz’s Way: Fidelity is defined relative to the ability to simulate the 
biological mechanisms -- the proximal stimulus. Thus, binocular and binaural inputs 
might be considered essential to a high-fidelity experience of space. 


5) Broadbent's Way: Information processing rate, sensitivity, bias, and stability might 
prove the best measures of fidelity. 


6) Dewey's Way: The measure of fidelity is the degree to which the simulation 
captures the richness of natural couplings between perception and action. 


7) Gibson’s Way: With fidelity, the constraints on action take precedence over the 
constraints on perception, and reality of experience is defined relative to functionality, 
rather than to appearances. (Paraphrased from [FLAC98]) 


4. Presence 

Presence, the sense of being there, has been a heavily debated topic among VE 
developers. There is no argument that the sense of presence within a VE is an extremely 
vital aspect of any VE, and that “...virtual environments that are best at simulating 
multiple senses are also best at evoking a feeling of presence an immersion” [ANDE97]. 
The debate over presence is a debate about definition and measurement. Depending on 
your interpretation, there can be many possible meanings of presence. For instance, a 
well-written book can cause one to be immersed into the intricacies of a good plot. A 
great live theater production or cinematic movie can also stir the senses causing a sense 
of being there -- presence. In VE applications, we typically measure presence by how 


well our senses (all of them) are stimulated. For “...1t 1s both the interactivity and the 
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quality of the rendering that results in the mmersiveness of a virtual reality or multimedia 
system” [BEGA94]. Sheridan [SHERI96] makes an interesting observation that through 
evolution, our senses developed in order, from tactile to vision to audition, but that 
technology used to stimulate our senses has developed in reverse, from audition to vision 


to tactile as depicted in Figure 23. 
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Figure 23. Darwinian Vs. Technological Evolution 
From [SHERI96]. 


In VE applications, most agree that the level of presence 1s directly proportional 
to the level of audio, visual and tactile fidelity. Accordingly, “Tight linkage between 
visual, kinesthetic, and auditory modalities is the key to the sense of immersion that 1s 
created by many computer games, simulations, and virtual-reality systems” [LAUR93]. 

As such, the level of fidelity is directly proportional to the level of presence. Thus, the 
| level of presence must be a function of fidelity. Nevertheless, most do not agree on how 
to measure the level of presence. Sheridan uses the following Three Attribute Scale of 
Presence to rate the fidelity of picture, sound, and tactile images. 


I. Virtual image resolution (pixels or taxels per frame), refresh rate (frames per 
second) and gray-or color-scale (bits per pixel or taxel) are too few to convey realism. 


2. Virtual image fidelity is fairly realistic. Resolution (pixels or taxels per frame), 
refresh rate (frames per second) and gray-or color-scale (bits per pixel or taxel) are 
enough to convey good sense of reality. 


3. Virtual image 1s compelling. Difficult to discriminate the virtual from the real 
based on any given image. [SHERI96] 


46 


Slater and Wilber [SLAT97] discuss various parameters affecting presence 
including the parameter of vividness as it relates to pictorial realism. They describe an 
experiment using a driving simulator in which two different levels of the pictorial realism 
were presented to the immersed participant. The results indicated that: “There was a 
significant difference in the level of reported presence between the two levels of pictorial 
realism, with the more realistic resulting in a higher level of reported presence” 
[SLAT97]. As a result of their research, Slater and Wilber introduce the Framework for 
fimmersive Virtual Environments (FIVE) which shows the relationship to presence among 
several factors including visual, auditory, and tactile displays as depicted in Figure 24. 
Also, 1n a previous research effort [SLAT94], Slater found that a person’s dominant sense 


may influence a person's sense of presence. 
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Figure 24. Framework for Immersive Virtual 
Environments From [SLAT97]. 


Hendnx [HEND94] [HEND96a] [HEND96b] conducted a number of experiments 
to measure the level of presence within VEs during a navigation task as function of visual 
and audio display parameters. In one set of experiments, the visual display parameters 


manipulated were: 1) presence or absence of head tracking, 2) presence or absence of 
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stereoscopic cues, and 3) size of geometric field of view used to create the visual image 
projected on the visual display. In another set of experiments, the audio display 
parameters manipulated were: |) presence or absence of spatialized sound, and 2) 
nonspatialized versus spatialized sound. The results from the experiments involving 
visual display parameter manipulation concluded: “...a significant positive correlation 
between the reported level of presence and the fidelity of the interaction between the 
virtual environment participant and the virtual world” [HEND96a]. The results from the 
experiments involving audio display parameter manipulation indicated that: 


..the addition of spatialized sounds significantly increased the sense of presence but 
not the realism of the virtual environment. Despite this outcome, the addition of a 
spatialized sound source significantly increased the realism with which the subjects 
interacted with the sound source, and significantly increased the sense that sounds 
emanated from specific locations within the virtual environment. The results suggest that, 
in the context of a navigation task, while presence in virtual environments can be 
improved by the addition of auditory cues, the perceived realism of a virtual environment 
may be influenced more by changes in the visual rather than auditory display media. 
[HEND96b] 


As such, although spatialized sounds can increase the sense of presence with in a VE, the 


perception of realism in a VE is still dominated by the visual modality. 


C. AUDITORY-VISUAL PERCEPTUAL ORGANIZATION 


1. Gestalt Theory 


The perception of an auditory-visual display can be considered in terms of the 
Gestalt point of view. If we extend the Gestalt Factors of Perceptual Organization 
discussed earlier in GESTALT THEORY (Chapter II, Section G) from visual-only 
stimuli to visual and audio stimuli, the factors of Similarity, Proximity, Fixation and 
Object Interdependence become particularly interesting to the possible Roreennial 
grouping of an auditory-visual display. The definitions of these (visual) factors are as 
follows: 


Similarity: If a number of elements are present in the perceptual field, those with 
similar characteristics will be seen as though they are grouped together. 
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Proxunity: Elements of the perceptual field located near one another will tend to be 
seen as a group or unit. 


Fixation: The organization of certain kinds of patterns clearly depends on where the 
observer fixes his attention. 


Object Interdependence: ...prcvalent in the organization of complex patterns 
encountered in visual experience is a tendency to group objects that are functionally 
rather than physically similar. We frequently see objects in this way if they display some 
kind of interdependent relationship. [MURC73] 


When a high-quality visual display is coupled with a high-quality auditory display, for 
the intended presentation of an audio-visual display, the factor of Similarity may cause a 
perceptual quality grouping of the audio-visual display. Also, through the perceptual 
illusion of the ventriloquism effect, the audio portion of an audio-visual display may 
perceptually emanate from the proximal locality of the visual display perhaps causing a 
perceptual grouping based on the factor of Proximity. When viewing any audio-visual 
display, the observer must, at sometime, fixate on the display which in turn might cause a 
perceptual grouping by the factor of Fixation. Furthermore, since it is typical to hear 
music playing on aradio, music (audio) and a radio (visual) may be perceptually grouped 


together through the factor of Object Interdependence. 


2. Auditory Scene Analysis 
In terms of auditory-visual interaction, A] Bregman mentions in his book, 
Auditory Scene Analysis: The Perceptual Organization of Sound that there many 


similarities between visual and auditory perceptual groupings. Specifically, 


... the similarity of principles of organization in the visual and auditory modalities is 
that the two seem to interact to specify the nature of an event in the environment of the 
perceiver. This 1s not too surprising, since the two senses live in the same world and it is 
often the case that an event that 1s of interest can be heard as well as seen. Both senses 
must participate in making decisions of “how many,” of “where,” and of “what.” 
[BREG90] 


But as opposed to the Gestalt point of view, which focuses on the similarities among 
modalities, Bregman also presents an interesting ecological point of view which focuses 


on the differences of the modalities. 


There is a crucial difference in the way that humans use acoustic and light energy to 
obtain information about the world. This has to do with the dissimilarities in the ecology 
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of light and sound. [n audition humans, unlike their relatives the bats, make use primarily 
of the sound-emitting rather than the sound-reflecting properties of things. They use their 
eyes to determine the shape and size of a car on the road by the way in which its surfaces 
reflect the light of the sun, but use their ears to determine the intensity of the crash by 
receiving the energy that is emitted when this event occurs. The shape reflects energy; the 
crash creates it. For humans, sound serves to supplement vision by supplying information 
about the nature of events, defining the “energetics” of a situation. [BREG90] 


This difference between vision and audition 1s further evidenced through the use of 
echoes. In audition, we are mainly interested in the direct source of sound rather its 
echoes, but we can also combine direct sound and indirect sound (echoes) to establish a 
mixed sound which still conveys information of the direct sound but with the additional 
properties (1.e. reverberation) of the indirect sound. However, with vision, we are mainly 
concerned with the indirect image (echoes or reflections), and we are not able to combine 
direct and indirect images to establish a mixed visual 1mage. Bregman suggests that it 1s 
these ecological Siiereneee which might cause “apparent violations of the principle of 


exclusive allocation of sensory evidence.” [BREG9O] 


D. AUDITORY-VISUAL ART FORMS AND FILM 


1. Art Forms 


In terms of the Arts, Joseph Schillinger explains the correlation of visual and 
auditory art forms through mathematics. Schillinger believed that: 


A scientific theory of the arts must deal with the relationship that develops between 
works of art as they exist in their physical forms and emotional responses as they exist 1n 
their psycho-physiological form, i.e., between the forms of excitors and the forms of 
reaction. As long as an art-form manifests itself through a physical medium, and is 
perceived through an organ of sensation, memory and associative orientation, it is a 
measurable quantity. Measurable quantities are subject to the laws of mathematics. Thus, 
analysis of esthetic form requires mathematical techniques, and the synthesis of forms 
(the realization of forms in an art medium) requires the technique of engineering. 
[(SCHI48] 


Schillinger referred to the visual art form as Elements of Visual Kinetic Composition and 
the auditory art form as Elements of Music. The Elements of Visual Kinetic Composition 
consisted of the following four main components: 


1. Linear, plane and solid trajectories (distance, dimension, direction, form). 
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. [[lumination (forms and intensity of light). 
. Texture (density of matter, quality of surface). 
. General component: trme. [SCHI48] 
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The Elements of Music consisted of the following five main components: 


. Frequency (pitch). 

. Intensity (relative dynamics). 

. Quality (harmonic composition). 

. Density (quantitative aggregation of sound). 
. General component: time. [SCHI48] 
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As such, Schillinger believed that mathematics might appropriately describe visual and 
auditory correlated art forms and that “The correlation of the general component in both 
art forms may be assigned to different proportionate relations, such as harmonic ratios, 
distributive powers, series of growth, etc.” [SCHI48]. Some of these mathematical 


relations which describe art forms are depicted in Figure 25. 


quality of matter’s surface quality of matter’s surface 


pitch -++ relative dynamics relative dynamics + harmonic com- 
position | 


quality of matter’s surface quality of matter’s surface 


harmonic composition -++ quantitative quantitative aggregation of sound + 
aggregation of sound pitch 





Figure 25. Combined Visual-Auditory Art Form Mathematics From [SCHI48]. 


Furthermore, Figure 26 depicts Schillinger’s concept of the overall relationship among 


the components of a combined kinetic art form. 


2. Film 
For many years, the entertainment industry has realized the important relationship 
between visuals and sound. Even before sound was an integral part of film, sz/ent movies 


were accompanied with specific music to enhance the ood of certain scenes. As Gary 


Rydstrom of Skywalker Sound explains: 


Storytelling, mood setting, character development, drama and style can all be more 
successfully realized by the careful collaboration of images and sounds. There is a 
magical level reached when picture and sound work together, a creative dimension not 
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Figure 26. Components of a Combined Kinetic Art Form From [SCHI48]. 


reached by either picture or sound alone. ...When approached creatively, the combination 
of sound and image can bring something to vivid life, clarify the intent of the work, and 
make the whole experience more memorable. [RYDS94] 


Realizing this important relationship between visuals and sound in film, Lipscomb and 
Kendall [LIPS90] [LIPS94] investigated the perceptual judgement of the relationship 
between musical and visual components in film. In their experiments, they took various 
motion picture sequences and manipulated their soundtracks. The motion picture 
sequence containing the original soundtrack along with the motion picture sequence 
containing various manipulated soundtracks were presented to subjects. The task of the 
subject was to select the soundtrack that best fit the visuals of the film. Interestingly, the 


results indicated that “the composer-intended musical score [the original score] was 


ae 


identified as the best fit by the majority of subjects for all conditions” [LIPS94]. In a 
related experiment, they also found significant results strongly suggesting that a musical 


soundtrack can in fact change the perceived meaning of a film presentation. 


KEK. AUDITORY-VISUAL CROSS-MODAL MATCHING 


Cross-modal matching is using information obtained through one sensory 
modality to make a judgment about an equivalent stimulus from another modality. 
Lawrence Marks has been studying auditory-visual cross-modal matching over the last 
twenty-five years. He has conducted several experiments which suggest a strong 
auditory-visual cross-modal matching among brightness, pitch, and loudness. In 1974 
[MARK74], he had subjects match pure tones to the brightness of gray surfaces. His 
results indicated that most subjects matched increasing auditory pitch to increasing visual 
brightness. Marks further concludes that his findings “...mimic those of synesthesia...” 
[MARK74] (see SYNESTHESIA, Chapter II, Section H). In-1982 [MARK82], Marks 
conducted a series of four experiments in which subjects used scales of loudness, pitch, 
and brightness to evaluate the meanings of various auditory-visual synesthetic metaphors 
such as: sound of sunset, murmur of dawn, and bright whisper to name a few. He found 
that loudness and pitch expressed themselves metaphorically as greater brightness, and 
likewise, that brightness expressed itself metaphorically as greater loudness and as higher 
pitch. This series of experiments led Marks to believe that: 


The ways that people eyaluate synesthetic metaphors emulate the characteristics of 
synesthetic perception, thereby suggesting that synesthesia in perception and synesthesia 
in language both may emulate from the same source -- from a phenomenological 
similarity in the makeup of sensory experiences of different modalities. [MARK82] 


Marks has also conducted experiments involving auditory-visual cross-modal perception 
of intensity [MARK 86], auditory-visual cross-modal similarities in speeded 
discrimination [MARK87], and additional experiments concerning auditory-visual cross- 
modal similarities with pitch, loudness, and brightness [MARK89]. The results of these 


experiments are similar to his earlier experiments and provide more evidence to support 
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strong auditory-visual cross-modal matching among pitch, loudness, and brightness. In 
terms of cross-modal matching, one might conclude from Marks’ findings that our senses 
are integrated somehow. However, Stein and Meredith offer a different point of view 
based on a neurological perspective: 


While cross-modal matching ts clearly an intersensory phenomenon, and may involve 
multisensory neurons, one could make the case that it has little to do with the integration 
of inputs from different modalities per se, and that multisensory areas of the brain need 
not play any special role in this process. The judgments of equivalence across modalities 
could depend on the individual inputs being held in the central nervous system In 
modality-specific form, so that they are independent of one another but sull may be 
accessed by another neural pool. [STEI93] 


F. VISUAL DOMINANCE OVER AUDITION 


1. Ventriloquism Effect 


A well-known auditory-visual intersensory phenomenon Is that of the 
Ventriloquism Effect (see [HOWA66]). As the name implies, this phenomenon refers to 
the illusion created by a skilled ventriloquist when we think we hear the dummy talking, 
when in fact we are actually hearing the altered voice of the ventriloquist. Not only do we 
hear the dummy talking but we actually think the sounds of the dummy are emanating 
from the dummy’s mouth and not from the ventriloquist even though we know that the 
dummy cannot really talk as depicted in Figure 27.This effect demonstrates the strong 
spatial coupling that occurs between the auditory and visual senses, and as a result has 
been the topic of much research (see [HOW A66] [PICK69] [BERM76] [RADE76] | 
[WARR81] [RAGO88] [STEI93]). One reason why the ventriloquism effect occurs 1s 
that the visual sense is usually the dominant sense as discussed earlier in Visual 
Dominance (Chapter II, Section E). As a result, “...unless there are dramatic differences 
in the intensities of different stimuli, the visual effect on the information generated in 
most other sensory systems 1s greater than their effect on visual perception” [STEI93]. 
Therefore: 


...1f visual stimuli are appearing at the same frequency and providing information of 
the same general type or importance as auditory or proprioceptive stimuli, biases toward 
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Figure 27. The Ventriloquist From [STEI93}]. 


the visual source at the expense of the other two [auditory and proprioceptive] will be 
expected [WICK92]. ; 


2. Experimental Results Supporting the Ventriloquism Effect 

Radeau and Bertelson [RADE76] conducted an experiment on the effect of a 
textured visual field on modality dominance during the ventriloquism effect. The results 
indicated that “...visual texture affects the degree of auditory capture of vision, but not the 
degree of visual capture of audition...” [RADE76]. Bermant and Welch [BERM76] 
investigated the effect of degree of separation of an audio-visual stimulus and eye 
position upon the spatial interaction of the ventriloquism effect. One of the more 
interesting results of this study was that “...the ventriloguism effect is not dependent on 
the use of a visual source which has been experimentally associated with the production 
of sounds” [BERM/76]. The role of auditory-visual compellingness in the ventriloquism 
effect was studied by Warren et al._[_WARR81] where it was found that given a highly 
compelling stimulus situation, “...subjects showed a very high visual bias of audition, a 


significant auditory bias of vision, and a sum of bias effects that indicated that their 


perception was fully consonant with the assumption of a single perceptual event” 
[WARRS81]]. Ragot et al. [RAGO88] explored auditory and visual ventriloquism 
reciprocal effects. Their findings suggested that “...visual dominance appears when 
attention 1s divided between visual and auditory modalities, but seems to be absent...when 
the subjects are asked to attend to one modality while knowing the other” [RAGO88]. 
Knudsen and Brainard [KNUD95] present neurological evidence from studying the optic 
tectum (also referred to as the superior colliculus). This evidence explains the 
ventriloquism effect supporting visual dominance over audition. They conclude that: 


The angular [spatial] distance that can separate visual and auditory stimuli and still 
result in facilitatory interactions in tectal neurons depends on the sizes of their visual and 
auditory receptive fields. Because visual receptive fields are consistently smaller than 
auditory receptive fields,...bimodal tectal neurons are more sensitive to displacements of 
a visual stimulus from its optimal location than to displacements of an auditory stimulus. 
As a consequence, the site in the bimodal tectal map that is activated by visual and 
auditory stimuli should be more sensitive to the location of the visual stimulus than to the 
location of the auditory stimulus. [KNUD95] 


Knudsen and Brainard believe that the behavioral correlates of this neurological evidence 
support increased sensitivity and localization activity when stimuli contain both visual 
and auditory components. Figure 28 depicts the hypothetical neural representations on the 


tectal surface that occur with spatially separate auditory and visual stimult. 


3. Auditory-Visual Divided Attention Experimental Findings 


During signal detection (temporal in nature and typically associated with 
sustained attention or vigilance), the auditory channel proves dominant over the visual 
channel, which is why warning signals are typically produced with auditory devices. (see 
APPENDIX B. AUDITORY-VISUAL CROSS-MODAL SIGNAL DETECTION AND 
VIGILANCE BIBLIOGRAPHY.) However, in most other areas, our visual sense 
dominates the hearing sense as can be seen from the following experimental findings. 

In 1954, the United States Air Force released an aensive technical report which 
compared the visual and auditory senses as channels for data presentation during cockpit 


crew coordination [HENNS54]. As mentioned in this report: 
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Hlypothetical neural representations of spatially separate visual and auditory sumuli 
(bottom), schematically illustrated on a plane representing the tectal surface. The relative activity of 
diffcrent tectal loci is indicated by the relative height above the plane. Neurons located outside of 
the zones of excited neurons are inhibited (not shown) by the stimulus. Top: A frontal visual stimulus 
results in a sharp peak of activity centered in the rostral (R) tectum. Middle: An auditory stimulus 
located more penpherally results in a peak of activity centered further caudal (C) in the tectum. The 
peak is broader because auditory receptive fields are much larger than visual receptive fields. 
Bottom: The combination of visual and auditory stimuli results in a single peak of activity located 
between the peaks for the unimodal stimuli but biased towards the location at which the visual 
stimulus was represented. 


Figure 28. Hypothetical Neural Representation of Auditory and 
Visual Stimuli on the Tectal Surface From [KNUD95]. 


The evidence seems to indicate that when a person 1s required to divide his attention 
or to shift back and forth between two tasks, one visually controlled, the other aurally 
controlled, either task can be made a “‘priority” task at the expense of the other. Sense 
channel as such does not determine this priority. 


One of conclusions of this report indicated that there was little experimental evidence 


comparing audition and vision as channels for data presentation. The Air Force found that 


“The majority of the studies have been concerned with receptor processes and sensory 
thresholds rather than with perceptual phenomena” [HENN54]. Ultimately, the Air Force 
recognized: 


...the many practical difficulties that have stood in the way of directly comparing 
these two sense modalities [audition and vision] in the experimental laboratory. It has not 
thus far been possible to establish common dimensions along which to locate comparable 
visual and auditory stimuli. Furthermore, different psychophysical procedures must 
frequently be employed in comparing the two modalities (largely because of the 
temporal-sequential character of auditory stimuli). As a consequence, it 1s not possible to 
compare directly auditory and visual judgments with broad generality and high degree of 
practicability. [HENNS5S4] 


Francis Colavita [COLI74] describes a series of experiments exploring sensory 
dominance in which subjects responded to suprathreshold auditory and visual stimuli. 
The auditory stimuli consisted of tones and the visual stimuli consisted of light flashes. 
The stimuli were randomly presented as auditory-only, visual-only, and combined 
auditory-visual. The subject’s task was to identify which stimuli occurred. When subjects 
were presented with the combined auditory-visual stimuli, the subjects typically only 
responded that a: visual light flash occurred, and usually did not even notice that an 
auditory stimuli (tone) was present. Thus, in this task, the findings suggest visual 
dominance over the auditory sense. 

In a study investigating the perceived duration of auditory and visual intervals, 
Behar et al. [BEHA74], found that auditory intervals (white noise) were consistently 
judged to be about 20% longer than visual intervals (light from a neon glow-lamp) of the. 
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same duration. This finding “...calls attention to the contribution of peripheral variables 
and indicates that they must not be ignored in accounting for psychophysical judgments” 
[BEHA74]. 

Burrows and Solomon [BURR75] conducted an experiment investigating the 
ability to scan auditory and visual information in parallel. Subjects were presented with 
ie of letters, one being a visually presented letter and the other being an aurally 


presented letter. The pairs of letters were presented simultaneously or sequentially. The 


subjects’ efficiency of memory retrieval was measured in both conditions: 1) 
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simultaneously presented letters or 2) sequentially presented letters. Their results 


indicated that: 


Paralle] scanning ts possible with a simultaneous presentation but not with sequential 
presentation. In retrospect, this 1s not surprising. The simultaneous condition provides the 
opportunity for two, modality specific, continuous records of the auditory and visual 
sumul, unbroken by switches to another modality. In the sequential condition. the record 
for each modality must contain “dead time” whenever a switch to the other mode of 
presentation takes place. [BURR75] 


Egeth and Sager [EGET77] explored the locus of visual dominance over audition 
in which subjects responded to suprathreshold stimuli consisting of an audio-only tone, a 
visual-only light flash, and a combined auditory-visual tone-light flash. Their findings 
suggest that: 


..Sensory or perceptual processing of the [auditory] tone is not affected by the light, 
1.e., that visual dominance is nonsensory in locus and depends on the relevance of the 
[visual] light stimulus. This interpretation was reinforced by other findings which showed 
that the degree of visual dominance was sensitive to the probability of light, tone, and 
light-plus-tone trials and to instructions to attend to a specific modality, but was not 
sensitive to the intensity of the light. [EGET77] 


Jones and Kabanoff [JONE75] conducted an experiment to determine if eye 
movements are a factor in auditory localization. Jones and Kabanoff based this research 
on the hypothesis that “...1ntersensory effects depend upon anatomical linkages of the 
different sensory areas via the motor cortex, which may serve to integrate neural activity 
by sampling the state of the different sensory receptors” [JONE75]. They found that 
auditory localization accuracy is increased if the subject moves his eyes in the direction 
of the intended target. Their findings suggest that “...voluntary eye movement rather than 
a visual map is likely to provide the framework for spatial judgments” [JONE75]. 

McGurk and MacDonald [MCGU76] investigated the effect of seeing certain lip 
movements associated with hearing contradictory speech sounds. Subjects were presented 
auditory-only speech sounds and mismatched auditory-visual (speech-lip movements) 
combinations. Their results were remarkable. During the combined auditory-visual 
mismatches, most subjects were convinced they were hearing what they were seeing (lip 
movements), when in fact the lip movements were not the correct lip movements for the 


associated speech sound that they were hearing. Furthermore, even if one has prior 
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knowledge of the auditory-visual mismatches, it does not preclude one from being 
convinced they were hearing what they were seeing (incorrectly). The results of this 
experiment were so strong that it is commonly referred to as the McGurk Effect. It is 
interesting to note that “...the sight of lip movement actually modifies activity in the 
auditory cortex. By whatever mechanisms the visual cue actually enhances the processing 
of auditory inputs, it is the functional equivalent of altering the signal-to-noise ratio of the 
auditory stimulus by 15-20 decibels...” [STEI93]. 

Rosenblum and Fowler [ROSE91] investigated if loudness judgements of speech 
are more closely related to the visual degree of exerted vocal effort than to the actual 
emitted acoustical properties of intensity. As in the McGurk Effect, subjects were 
presented conflicting audio-visual stimuli. Their findings suggest that when making 
loudness judgements of speech, the visual cues of vocal effort significantly outweigh the 
cues provided by the appropriate levels of acoustic intensity. 

Massaro and Warner [MASS77] conducted an experiment which investigated 
divided attention between auditory and visual perception. In their experiment, subjects 
were asked to recognize test tones and test letters under selective and divided attention. 
They concluded that “...the degree of capacity limitations and attentional contro] during 
visual and auditory perception is small but significant” [MASS77]. 

Hanson [HANS81] conducted an experiment to investigate 1f common processing 
‘of semantic, phonological, and physical systems were involved during reading and 
listening. Subjects were simultaneously presented two words, one visually and one 
aurally, but were instructed to attend to only one modality and to make responses based 
on that attended modality. Her results indicated that the unattended words had an 
influence on semantic and phonological decisions, but had no influence on the physical 
task. (In the physical task, the visual words were presented in either smal] or capital 
letters and the aural words were presented in either a male or female voice.) Hanson 


concludes that the written and spoken words “share semantic and phonological 
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processing but have separate modality-specific codes that operate on information prior to 


the convergence of information from visual and auditory inputs” [HANS81]. 


G. AUDITORY-VISUAL THRESHOLD PERCEPTION 


The body of evidence presented thus far clearly indicates that under certain 
conditions, auditory-visual perceptual phenomena do exist. In fact, most auditory-visual 
research has focused on threshold levels, absolute sensitivity, or just-noticeable- 
differences (JND). Gilbert [GILB41] and Ryan [RYAN4O] independently conducted 
exhaustive literature surveys covering these topics and asummary of their findings was 
presented earlier in Sensory Interaction (Chapter II, Section C). Additional evidence 
supporting auditory-visual perceptual phenomena from threshold level stimuli can be 
found in the following references: [SERR35] [PRAT36] [LOND54] [THOMS58] 
[LOVE70]. Nevertheless, for a better understanding of this type of research, the findings 
of two experiments are presented showing auditory-visual perceptual phenomena from 
threshold-level stimul1. 

An example of the research reviewed by Gilbert and Ryan is that of Kravkov 
[KRAV36], one of the early pioneers in the area of intersensory experimentation. 
Kravkov’s experiment investigated the influence of sound upon the light and color 
sensitivity of the eye. In this experiment three female subjects were presented an auditory 
stimulus consisting of a 2100 Hz tone at 100 decibels for a duration of about 10 minutes. 
During these 10 minutes, measurements were made of color and light sensitivity. The 
results are as follows: 


1. The rod sensibility of the eye decreases under the influence of simultaneous sound. 


2. The colour sensibility of the eye changes differently under the influence of sound, 
according to the wavelength of the stimulating light. ... Whereas the colour sensibility for 
green rises during the acoustic stimulation the colour sensibility for orange-red decreases. 
[KRAV36] 


In 1952, Gregg and Brogden [GREGS2] conducted an experiment on the effect of 


simultaneous visual stimulation on absolute auditory sensitivity. In their experiment 
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subjects were presented an auditory tone along with an auxiliary light source. Their 
results indicate that when subjects were asked to report the prescnce of a visual light 
source along with an auditory tone, the light stimulus decreased subject sensitivity to a 
1000 Hz tone. However, when subjects were only required to report the presence of an 


auditory tonc, the light stimulus increased sensitivity to the auditory tone. 


H. AUDITORY-VISUAL SUPRATHRESHOLD PERCEPTION 


This section presents the motivation and findings of those experiments in which 
suprathreshold auditory stimuli influenced visual perceptual quality, fidelity, or 
resolution; and/or suprathreshold visual stimuli influenced auditory perceptual quality, 
fidelity, or resolution. These experimental findings are of primary interest and directly 


support the motivation for this dissertation. 


1. Motivation 


When one talks about the using both audio and visual displays for some kind of 
simulation, game, VE, etc., some people will say that the use of high quality sound 
positively influences their perception of the visual] images. For Ponies Brenda Laurel 
States that: “...in the game business we discovered that really high-quality audio will 
actually make people tell you that the games have better pictures, but really good pictures 
will not make audio sound better; in fact, they make audio sound worse” [TIER93]. Why 
is this? The reason is probably because simulations, games, VEs, etc., all started out as 
having only visuals, and then added sounds later. The addition of the sounds, then, adds 
to the overall perception of the experience. As a result, the visuals appear better. It is also 
interesting to note that the reverse 1s usually never reported, that the use of high-quality 
visual images positively influences perception of auditory displays. Why is this? Again, 
the answer is probably because we are used to games based on the visual displays. 


However, if games started out as audio only and then added visuals later, then perhaps, 
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the addition of high-quality visual displays might positively influence subject perception 
of the visual images. Unfortunately, few examples exist to help analyze this hypothesis. 

As described earlier in Sensory Interaction (Chapter IJ, Section C), there are 
various theories about sensory interaction. In terms of auditory-visual sensory interaction, 
In particular, studies of infants have revealed evidence that there exists a: 


... Spatially organized, functional relation between auditory and oculomotor systems 
from birth. This coordination may be enhanced by intrinsic spatial properties of the visual 
system that act to ensure auditory and visual colocation. Such a functional relation might 
in turn facilitate the detection of intermodal equivalence, since sounds are usually 
accompanied by sights. [BUTTS81] 


Stein and Meredith theorize that “combinations of, for example, visual and auditory cues 
can enhance one another and can also eliminate any ambiguity that might occur when 
cues from only one modality are available” [STEI93]. Murch believes that “under many 
conditions the encoding of strictly visual material or strictly auditory material involves 
the use of short-term storage of both systems” [MURC73]. Since auditory and visual 
displays can influence each other, then as Durand Begault suggests, “...another solution 
for improving the immersivity and perceived quality of a visual display and the virtual 
simulation in general is to focus on other perceptual senses -- in particular, sound” 
[BEGA94]. For example, Negroponte recounts the following story of designing military 


tank simulators: 


In the design of military tank trainers, considerable effort was made to have the 
highest achievable display quality (at almost any cost), so that looking at the display was 
as close to looking out the window of a tank as possible. Fine. Only after painstaking 
endeavors to keep increasing the number of scan lines did the designers think to introduce 
an inexpensive motion platform that vibrated a little. By further including some 
additional sensory effects -- tank motor and trend sounds -- so much realism was 
achieved that the designers were then able to reduce the number of scan lines; they 
nonetheless exceeded the requirement that the system look and feel real. [NEGR95] 


However, the empirical evidence supporting how auditory and visual displays can 
influence the quality perception of each other is lacking. One reason for the lack of 
empirical evidence is that “...the first problem in comparing vision and hearing is of 
specifying perceptually relevant dimensions for both modalities, a problem which still 


resists truly satisfactory solution” [JONE81]. Nevertheless, after an exhaustive literature 
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review, the following experiments present the only findings in which auditory displays 
influenced the quality perception of visual displays or visual displays influenced the 


quality perception of auditory displays. 


2. Experimental Results 


W. Russell Neuman [NEUM90] [NEUM91] conducted an experiment to measure 
the effect of changes in audio quality on visual perception on High-Definition Television 
(HDTV). The experimental design was to keep the quality of the visual stimuli constant, 
while only manipulating the auditory stimuli. The auditory conditions were as follows: 
low fidelity (very low-quality speaker system) vs. high fidelity (very high-quality speaker 
system); monaural vs. stereo: and three types of television programming: sports, situation 
comedy, and action-adventure. Subjects were presented a short video clip along with one 
of the auditory conditions. The subjects were then asked to rate 1) their liking, 2) their 
level of interest, 3) their psychological involvement in the programming, 4) picture 
quality, and 5) audio quality. Their results indicated that subjects “...had a difficult time 
distinguishing mono from stereo and even low-fidelity from high-fidelity sound. ...[and] 
video with better quality and stereo sound were consistently rated as more likable, 
interesting, and involving” [NEUM91]. Perhaps the most interesting finding was that a 
few subjects perceived an increase 1n visual quality when coupled with better audio even 
though the visual quality remained constant throughout the experiment. This finding, 
however, was not Statistically significant and it only occurred in one of the three 
presented types of television programming. 

Iwamiya [[WAM9?2] investigated the effect of visual information on the 
impression of sound and the effect of auditory information on the impression of visual 
images when listening to music via audio-visual media. The factors used to evaluate the 
impression of both audio and visual images were: tightness, evaluation, brightness, 
uniqueness, and cleanness. “These factors are considered to be the intermodalities 


between auditory and visual processing” [[WAM92]. Iwamiya found that the factors of 
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brightness, tightness, and cleanness of the auditory images enhanced the perception of 
brightness, tightness, and cleanness of the visual images. Iwamiya concludes that: “The 
better the matching of sound and image, the higher the evaluation of auditory and visual 
impression. This kind of sviagetic interaction is controlled by the feedback loop from the 
total integrated impression of auditory in visual information.” [[WAM92] 

Hollier and Voelcker [HOLL97] conducted an experiment investigating the 
influence of video quality on audio perception. Thirty-two subjects watched video clips 
10 seconds in duration with supporting audio (speech) commentaries. In total there were 
eight video quality variations and four audio quality variations. Their results indicated 
that 1) when no video was present, the perceived audio quality was always worse than if 
video was present, and 2) although only small differences were noted, a decrease in video 
quality corresponded to a decrease in perceived audio quality. They ultimately propose an 
algorithmic approach for the proper development of an auditory-visual cross-modal 


perceptual model depicted in Figure 29. In their final discussion of the experiment, 
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Figure 29. Auditory-Visual Perceptual Model From [HOLL97]. 


Holher and Voelcker state that “for a majority of applications both in the 
communications and entertainment industry separate evaluation of audio or video quality 
is likely to become of limited value” [HOLL97]. 

Two companion papers by Woszczyk et al. [WOSZ95] and Bech et al. [BECH95] 
discuss the design and results of an experimental procedure examining the interaction 
between the auditory and visual modalities in the context of a home theater system. Their 
approach acknowledges that “...experiments involving both modalities require a novel 
approach that recognizes domains of cooperative interaction between the senses” 
[WOSZ95]. With the growing interest and development of virtual reality systems, 
Woszczyk identifies the need for testing the interaction of audio and visual displays in 
order to bring about “substantial improvements in the integration of various audio and 
video parts of these [virtual reality] systems, and thereby provide important perceptual 
benefits that enhance [the] audio-visual experience of the viewers” [WOSZ95]. The 
testing of audio-visual interaction 1s critical because “Auditory and visual channels work 
both independently and 1n mutual cooperation on both cognitive and sensory levels of 
perception,” [WOSZ95}. In order to study the interaction between the audio and visual 
sensory modalities “it 1s necessary to focus on the total experience and not on the two 
modalities individually” [BECH95], which supports Woszczyk et al.’s observations that 
“The matching of auditory and visual data triggers perceptual synergy between 
modalities and promotes intermodal fusion” [WOSZ95]. In their experiments, subjects 
assessed audio-visual reproductions using the subjective dimensions of action, space, 
mood, and motion while asking specific questions focusing on quality, magnitude, degree 
of involvement, and audio-visual balance. Quality was defined as: distinctness, clarity, 
and detail of the impression. One of their findings, of particular interest is that both visual 
and audio perceived quality increased with increasing screen size. To further explore 
auditory-visual interaction, Bech conducted two more experiments to investigate the 
influence of stereophonic (audio) width on the perceived quality of an audio-visual 


presentation using multichannel surround sound systems. During the experiments, the 
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subjects were asked to evaluate the quality (fidelity) of the spatial information contained 
In audio-visual reproductions. The results indicate that “the quality of [perceived] spatial 
reproduction increases linearly with an increase in the stereophonic [audio] width” 
[Bee]. 

Hugonnet [HUGO97] presents what he considers to be a new concept of spatial 
coherence between sound and picture in stereophonic TV production. “From a cultural 
and historical point of view, our perception of sound corresponding to image has 
remained monophonic” [HUGO97]. As such, Hugonnet describes methods of production 
and post-production to achieve spatial coherence of stereo sound with various TV content 
including: talk shows with two people, talk shows with more than two people, concerts, 
sports, and drama. He found that when people are first exposed to stereo sound when 
watching TV, people found the relation between visual and auditory images strange and 
not very comfortable. However, once people became accustomed to the stereo sound, if 
they were re-exposed to mono sound, they perceived the quality of the mono sound to be 
of lower sound quality. Hugonnet concludes by recognizing the importance of auditory- 
visual interaction and states: “It is up to us to bring about a radical change in audiovisual 
perception, where sound will gain its right place, on a par with the visual image’”’ 


[HUGO97]}. 


IL SUMMARY 


In summary, this chapter has provided an overview of Virtual Environments, 
Auditory-Visual Perceptual Organization, Auditory-Visual Art Forms and Film, 
Auditory-Visual Cross-Modal Matching, Visual Dominance over Audition, Auditory- 


Visual Threshold Perception, and Auditory-Visual Suprathreshold Perception. 
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IV. EXPERIMENTAL DESIGN OVERVIEW 


A. INTRODUCTION 


This chapter describes the motivation and initial considerations that led to the 
development of the experimental design used to gather empirical evidence supporting 
suprathreshold auditory-visual cross-modal quality perception phenomena. The various 
considerations outlined in this chapter were instrumental in developing the experimental 
design of the pilot study which ultimately led to the three main experiments forming the 
foundation of this dissertation. The experimental design details of the pilot study and 
three main experiments are described in greater detail in the next four chapters. Thus, the 
intent of this chapter is not to’focus on details, but rather to provide an overview of the 


choices that were considered during the initial experimental design development. 


B. MOTIVATION 


Based on the findings from the exhaustive background and literature review 
outlined in the previous two chapters, the following are some key observations: 


1) There is neurological and physiological evidence supporting auditory-visual 
cross-modal perception phenomena. 


2) There is psychological and psychophysical evidence supporting auditory-visual 
“cross-modal perception phenomena. 


3) There is empirical evidence supporting the ability to divide attention between 
audition and vision. 


4) There is empirical evidence suggesting that sound can influence the perceived 
mood of motion pictures. 


5) There is empirical evidence supporting auditory-visual cross-modal perception 
phenomena concerning increased sensitivity/acuity in audition and/or vision. 


6) There is a need to enhance multimedia and VE development through better 
understanding of auditory-visual cross-modal perception phenomena. 
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7) There 1s a lack of empirical evidence supporting auditory-visual cross-modal 
perception phenomena in which suprathreshold auditory sumul influenced visual 
perceptual quality and suprathreshold visual stimuli influenced auditory perceptual 
quality. 


Based on these key observations, which stem from wide-ranging interdisciplinary 
research, there is a need for empirical evidence supporting suprathreshold auditory-visual 
cross-modal quality perception phenomena. The ultimate goal of this dissertation answers 
the following question: In an audio-visual display, what affect (if any) do various audio 
quality levels have on the perception of visual quality and various visual quality levels 
have on the perception of auditory quality? The following are some specific derivations 
of this question: 


1) Are changes in the audio and/or visual qualities of an audio-visual display 
perceivable and can these changes be attended to also? 


2) Does a high-quality auditory display coupled with a low-quality visual display 
cause a decrease/increase in the perception of audio quality and/or an increase/decrease in 
the perception of visual quality relative to established baseline conditions derived from 
auditory-only and visual-only quality perception evaluations? 


3) Does a low-quality auditory display coupled with a high-quality visual display 
cause an increase/decrease in the perception of audio quality and/or a decrease/increase in 
the perception of visual quality relative to established baseline conditions derived from 
auditory-only and visual-only quality perception evaluations? 


4) Does a low-quality auditory display coupled with a low-quality visual display 
Cause a decrease/increase 1n the perception of audio quality and/or a decrease/increase in 
the perception of visual quality relative to established baseline conditions derived from 
auditory-only and visual-only quality perception evaluations? 


5) Does a high-quality auditory display coupled with a high-quality visual display 
Cause an increase/decrease in the perception of audio quality and/or an increase/decrease 
in the perception of visual quality relative to established baseline conditions derived from 
auditory-only and visual-only quality perception evaluations? 


In order to answer these questions concerning auditory-visual perceptual 


phenomena, the approach taken was to conduct an experiment to facilitate measuring 


70 


responses to various auditory-visual suprathreshold stimuli. The overall design of the 
experiment consists of three main portions: |) visual-only displays, 2) auditory-only 
displays, and 3) combined auditory-visual displays. During the visual-only portion, 
subjects are presented visual displays and are then asked to rate their visual quality. 
During the auditory-only portion, subjects are presented auditory displays and are then 
asked to rate their auditory quality. During the combined auditory-visual portion, subjects 
are presented combined auditory-visual displays, and are then asked to rate the quality of 
both the auditory portion and visual portion of the combined auditory-visual display. The 
goal 1s to compare the subject’s quality ratings made during the visual-only and auditory- 
only portions with the subject’s visual and auditory quality ratings made during the 
combined auditory-visual portion. The results of this comparison are analyzed to answer 
the questions of interest, and as such are the quintessential contribution of this 


dissertation. The initia] design considerations of this experiment are now presented. 


C. DESIGN CONSIDERATIONS 


1. Software and Hardware 


The first key consideration in the experimental design is that the experiment be 
automated. The goal is to create a computer program that can render visual-only, 
auditory-only, and combined auditory-visual displays while also capturing the required 
responses of the subject. An automated experiment is chosen since it helps to produce 
identical testing conditions, thereby reducing any potential confounds (1.e., confounding 
factors) that might arise through human error. Keeping in mind the self-imposed 
limitations described earlier in LIMITATIONS (Chapter I, Section E). the software 
chosen for the experiment consisted of HTML, Java, JavaScript, and VRML (all freely 
downloadable). The basic idea is to have the entire experiment contained within an 
HTML browser window as depicted in Figure 30. The visual-only, auditory-only, and 


combined auditory-visual] displays could then be rendered via JavaScript and/or VRML 
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Figure 30. Netscape HTML Browser Window. 


within the main HTML window. The subjects’ responses are then obtained with rating 


scales using Java pop-up windows as depicted in Figure 31. Furthermore, based on the 
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Figure 31. Java Pop-up Visual Display Rating Scale. 


software utilized, and keeping in mind the limitations of this dissertation, a personal 


computer (PC) was used for all experiments. The specifics of the software and hardware 


1p. 


used are explained in greater detai! during the description of the pilot study and the three 


main experiments in subsequent chapters. 


2. Visual Displays 


Important considerations in the development of this experiment include choosing 
the rendering, type/content, and quality manipulation parameters of the visual displays. 
The possible rendering choices of the visual displays considered were: 17-inch computer 
monitor, 20/2 1-inch computer monitor, 28-inch computer monitor, large screen TV, and 
triple large-screen TVs. Because of fidelity considerations and amount of available 
controlled laboratory space, the TVs were not utilized. The high cost of the 28-inch 
monitor precluded its use, and the 17-inch monitor proved to be too small. As a result, a 
20-inch computer monitor was selected to render all the visual displays. 

Choosing the type and content of the visual display was perhaps the most difficult 
task during the development of the experiment. Possible types of visual displays 
considered included: static (still 1mage) or dynamic (motion video, user controlled 
navigation in 2D space, or user controlled navigation in 3D space). To reduce the 
excessive Computational requirements of motion video, to reduce frame rate 
synchronization errors with associated auditory displays, and to reduce user-computer 
interaction training and variations associated with user controlled navigation, static 
images were chosen as the display type. Once the decision was made to use static visual 
displays, the next difficult task was to choose the content. After considering numerous 
BOR UMces. two visual displays were chosen: |) a radio and 2) scene depicting a bowl of 
fruit and flowers. Figure 32 and Figure 33 depict (in color) the radio and fruit-flower 
scene respectively. The rationale for the choice of content of these displays will be 
explained in greater detail during the description of the pilot study and three main 
experiments in subsequent chapters. 

Once the choice of rendering and type/content of the visual displays were 


determined, the quality-manipulation parameters were selected. Since the results of this 





Figure 32. Color Visual Display of Radio. 





Figure 33. Color Visual Display of Fruit-Flower Scene. 
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research effort will hopefully benefit multimedia and VE development, pixel resolution 
and noise level were chosen as the quality parameters to be manipulated. Selecting pixel 
resolution is perhaps the most prevalent decision in creating visual scenes for any VE. 
Increasing pixel resolution corresponds to an increase in realism at the expense of 1) an 
increase in rendering time, 2) an increase in storage requirements, and 3) an increase in 
download time (if networked). Thus, the VE developer must carefully consider the 
amount of required pixel resolution. Noise level, the other parameter, was chosen based 
on similar considerations as pixel resolution when one considers quality levels of MPEG 
video. High-quality MPEG video has a greater signal-to-noise ratio than low-quality 
MPEG video. Thus, a lower-quality visual image will have a greater noise level than that 
of a higher quality image. Another factor for using noise level was based on the visual 
display’s eventual coupling with an auditory display which is explained in the next 
section. A final consideration in the choice of visual displays was the ability to produce 
the various required quality levels. For example, if a potential quality metric cannot be 
produced due to software or hardware constraints, then that quality metric is not feasible. 
Since Adobe Photoshop [ADOB98] was utilized, its capabilities provided the limits of 
possible quality parameter manipulation. As such, all the visual displays used throughout 


all the experiments were developed using Adobe Photoshop. 


3. Auditory Displays 

Equally important considerations in the development of this experiment were 
choosing the fidelity, rendering, content, and quality manipulation parameters of the 
auditory displays. The possible fidelity choices of the auditory displays considered were: 
monophonic, stereophonic, and spatialized. The rendering possibilities of the auditory 
displays considered were: headphones, left and right small-computer speakers, left and 
right high-fidelity speakers, quad configuration of high-fidelity speakers, and surround- 
sound configuration of high-fidelity speakers. In order to minimize any potential 


experimental confounds due to varying room acoustics, headphones were chosen to 
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render the auditory displays. Similarly, to minimize any unforeseen confounds from 
using stereophonic or spatialized sound, monophonic fidelity was chosen for all auditory 
displays. Another factor for choosing monophonic audio fidelity was due to the static 
nature of the visual displays. Once the decision was made to use monophonic auditory 
displays, the next difficult task was to choose the content. After numerous possibilities, a 
music sound was chosen as the content of the auditory displays. The rationale for using 
music as the content of the auditory displays will be explained in greater detail during the 
description of the pilot study and three main experiments in subsequent chapters. Once 
the choice of fidelity, rendering and content of the auditory displays were determined, the 
quality manipulation parameters were selected. 

As stated earlier, since the results of this research effort will hopefully benefit 
multimedia and VE development, sampling frequency and noise level were chosen as the 
quality parameters to be manipulated. The choice of sampling frequency is similar to that 
of pixel resolution. Increasing sampling frequency corresponds to an increase in realism 
at the expense of |) an increase in rendering time, 2) an increase in storage requirements, 
and 3) an increase in download time (if networked). Thus, the VE developer must 
carefully consider sampling frequencies. Noise level, the other parameter, was chosen 
because signal-to-noise ratio is another common quality metric of audio. The amount of 
noise level, specifically Gaussian noise, was also chosen because of the eventual coupling 
- of auditory to visual displays with varying noise levels. As such, the level of Gaussian 
moise becomes a common quality metric between both auditory and visual displays as 
will be explained in greater detail during the description of the main experiments in the 
subsequent chapters. As with the visual displays, a final consideration in the choice of 
auditory displays was the ability to produce the various required quality levels. For 
example, if a potential quality metric cannot be produced due to software or hardware 
constraints, then that quality metric is not feasible. Since Sonic Foundary’s Sound Forge 


software [SONI98] was utilized, its capabilities provided the limits of possible quality 
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parameter manipulation. As such, all the auditory displays used throughout all the 


experiments were developed using Sound Forge. 


4. Location and Subjects 


The location for conducting al] experiments was at the Naval Postgraduate School 
(NPS) in Monterey. California. To limit external environmental noises and to control 
distractions, all experiments were conducted within an isolated room (office) in which the 
experimenter had total contro] of audio and visual conditions. As such, scheduling 
conflicts typically associated with the main laboratory were eliminated, which greatly 
facilitated the process of running experiment sessions. Furthermore, since all experiments 
were conducted at NPS, the NPS student body provided an excellent source of engaged 


and attentive volunteer subjects. 


5. Data Analysis 


Another important consideration in the experimental design was that of the 
eventual data analysis process! The important factor was that the data collection format 
had to mesh with the data analysis process. As such, a considerable amount of time was 
spent deciding how to analyze the resulting data even before the data was collected. 
Accordingly, the chosen method of data analysis helped to derive the format of data 
collection. Since StatView [SASI98] software was chosen to do the statistical analysis of 
the experimental results, the data collection process was in turn automated to facilitate the 


ease of importing data into StatView. 


D. DESIGN SELECTIONS 


Based on the motivation and initial design considerations, a pilot study was 
designed to investigate the perceptual effects from manipulating visual display pixel 
resolution and auditory display sampling frequency. The visual display consisted of the 
aforementioned radio, and the auditory display was a selection music. The entire 


automated experiment was contained within an HTML browser window using VRML to 
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render the visual-only, auditory-only, and combined auditory-visual displays, and using 
Java pop-up windows to collect subject responses. The details of the experimental design 
are outlined in Chapter VI. The lessons learned from this pilot study were instrumental in 
designing the three main experiments of this dissertation as follows: 1) Experiment 1: 
Static Resolution, 2) Experiment 2: Static Noise, and 3) Experiment 3: Static Resolution 
NonAlphanumeric. Each experiment was fully automated and contained within an HTML 
browser window using JavaScript to render the visual-only, auditory-only, and combined 
auditory-visual displays, and using Java pop-up windows to collect subject responses. 

As its name implies, Experiment |: Static Resolution is designed to investigate 
the perceptual effects from manipulating visual (static as opposed to dynamic) display 
pixel resolution and auditory display sampling frequency. The visual display consisted of 
the aforementioned radio, and the auditory display was a selection music. The details of 
the experimental design are outlined in Chapter VII. 

Experiment 2: Static Noise is designed to investigate the perceptual effects from 
manipulating visual (static) display Gaussian noise level and auditory display Gaussian 
noise level. The visual display consisted. of the aforementioned radio, and the auditory 
display was a selection music. The details of the experimental design are outlined in 
Chapter VIII. 

Experiment 3: Static Resolution NonAlphanumertc 1s designed to investigate the 
perceptual effects from manipulating visual (static) display pixel resolution and auditory 
display sampling frequency. The visual display consisted of the aforementioned fruit- 
flower scene, and the auditory display was a selection music. The details of the 


experimental design are outlined in Chapter IX. 


E. SOFTWARE DESIGN 


In order to better understand the type of computer programming used to develop 
the main experimental design, a brief overview of the software design and development is 


now provided. 
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1. Overview 

All software used in the development of the main experimental design is custom- 
designed and encapsulated into an HTML file. For each main experiment, a total of nine 
HTML files are developed. Each HTML file corresponds to the predetermined 
randomized sequence of appropriate auditory-only, visual-only, and combined auditory- 
visual stimuli. This randomization is based on the Latin square technique (see [GOOD95] 
for a description of the Latin squares technique). As such, to initiate an experiment 
testing session, the appropriate HTML file is simply executed. In an effort to minimize 
delays in rendering any of the auditory or visual stimuli, al] auditory and visual displays 


(files) were pre-loaded into memory as the HTML file is being executed for the first time. 


2. Development 


The development of the overall software design of the main experiment was 
divided into three main components: |) displaying instructions, 2) auditory and visual 


display rendering, and 3) user input. 


a. Displaying Instructions 

Since the experiment is to be automated, the user (subject) is presented 
with numerous sets of instructions. The wording of the various sets of instructions was 
fine-tuned throughout the pilot study in order to eliminate any possible ambiguities. All 
the various sets of instructions were written as separate Java applets which were simply 
embedded into the main HTML code. As such, all nine HTML files shared the same Java 
instruction applets. Thus, if any one set of instructions needed to be rewritten for clarity, 
only that one set of instructions had to be rewritten and recompiled, as opposed to 
rewriting the instructions in all nine HTML files. An example of the Java programming 


code used to produce one set of instructions 1s depicted in Figure 34. 
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unport netscape.javascript.*; 
import java.applet.*; 
unport java.awt.*; 
intport java.awevent. *; 
public class InstructionsAudioVisual extends Applet unplements WindowListener, 
ActionListener { 
private Button EnterButton; 
private Panel EnterPanel; 
private Textarea Text; 
public JSObyect win; 
public void tnit() { 
Text = new TextArea(’\n", 9, 67, 3); 
Text.append(" (1) You will now be rating the VISUAL quality of a combined audio-visual display.\n"); 
Text.append("\n"); 
Text.append(" (2) A total of 9 audio-visual displays will be presented randomly.\n"), 
Text.append("\n""); 
Text.append(" (3) Each audto-visual display will be presented for 8 seconds"); 
Text.append(" \n"); 
Text.append(" (4) After which, you will be prompted ONLY for your VISUAL rating"); 
Text.append("\n"); 
EnterPanel = new Panel(); 
EnterPanel.setLayout(new FlowLayout( FlowLayout.CENTER)); 
EnterButton = new Button("Press to Continue"); 
EnterButton.addActitonListener( thts); 
EnterPanel.add(EnterButton); 
GridBagLayout gridbag = new GridBagLayout(); 
GridBagConstraints c = new GridBagConstrainis(); 
setFont(new Font("Helvetica’, Font.PLAIN, 14)); 
setLayout( gridbag); 
c fill = GridBagConstraints. BOTH; 
c.gridwidth = GridBagConstraints. REMAINDER; //end row 
gridbag.setConstraints( Text, c); 
add Text); 
c.gridwidth = GridBag Constraints. REMAINDER; //end row 
geridbag.setConstraints(EnterPanel, c); 


add(EnterPanel); 
c.gridwidth = GridBagConstraints.REMAINDER; /end row 
}//end 


public void windowClosed( WindowEvent event) { 
public void windowDetconified(WindowEvent event) { 
public void windowIconified( WindowEvent event) { 
public void windowActivated( WindowEvent event) { 
public void windowDeactivated(WindowEvent event) { 
public void windowOpened( WindowEvent event) { 


public void windowClosing( Window Event event) { 
System.gc(); 


public void actionPerformed(ActtonEvent event) { 
Object source = event.getSource(); 
if (source == EnterButton) { 
win = JSObject.getWindow( this); 
win.evall( “audioVisualWrite()"); 
win.eval(""goToAudioVisualDisplays()"); 
System.gc(); 
} MH endif 
} “end actionPerformed 
} //end Applet 


Figure 34. Example of Java Applet used to Render Instructions. 
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b. Auditory and Visual Display Rendering 


All auditory and visual displays were rendered via JavaScript function 
calls within the main embedded HTML file. Figure 35 depicts a portion of the JavaScript 
programming code used to render three combined auditory-visual displays. Specifically, 
1) function HLC() is used to render a combined high-quality auditory and low-quality 
visual display; 2) function HMC() is used to render a combined high-quality auditory and 
medium-quality visual display; and 3) function HHC() is used to render a combined high- 


quality auditory and high-quality visual display. 


function HLC() { 
lughWrue(); 
low Write(); 
document. highSound.play( false ); 
document.umages["RenderDisplays" ].src = lowVisual; 
goloCombinedDisplays(); 


function HMC() { 
lughWrite(); 
medWrite(); 


document.images["RenderDisplays" ].src = med Visual; 
document. highSound.play(false ); 
goToCombinedDisplays(); 

} 


function HHC() { 
highWrite(); 
highWrite(); 
document.images[{ “RenderDisplays"].src = highVisual; 
document. highSound.play(false ); 
goToCombinedDisplays( ); 





Figure 35. Example of JavaScript Function Calls. 


c. User Input 

All user input is accomplished via Java Frames which contain the 
appropriate rating scales.A Frame is basically a window which can be made to appear 
and disappear (i.e., a pop-up window). Figure 36 depicts a portion of the Java 


programming code used to render a visual-only rating scale. 
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public class RatingScalesVisualAndRatinesTest extends Frame unplements WindowListener, 
ActionListener 


private ShowRatuneScalesVisualAndRatingsTest thisScale; . 
public final static String TITLE = "Visual Display Quality Rating Scale"; 
Checkbox one V.twoV.three V four V five V.sixV.sevenV; 
Button EnterButton; 
private Panel VisualPanel, EnterPanel; 
public RatingScalesVisualAndRatingsTest(ShowRatine Scales VisualAndRatingsTest owner) { 
super(TITLE); 
Panel VisualPanel = new Panel(); 
VisualPanel.setLayout(new FlowLayout( FlowLayout.CENTER)); 
VisualPanel.add(new Label(" <LOW>")); 
CheckboxGroup VisualGroup = new CheckboxGroup(); 
oneV = new Checkbox("]", VisualGroup, false); 
VisualPanel.add(oneV); 
twoV = new Checkbox("2", VisualGroup, false); 
VisualPanel.add(twoV); 
threeV = new Checkbox("3", VisualGroup, false); 
Visual Panel.add(threeV); 
fourV = new Checkbox("4", VisualGroup, false), 
VisualPanel.add(fourV); 
fiveV = new Checkbox("5", VisualGroup, false); 
VisualPanel.add(fiveV); 
sixV = new Checkbox("6", VisualGroup, false); 
VisualPanel.add(stxV); 
sevenV = new Checkbox("7", VisualGroup, false); 
VisualPanel.add(sevenV); 
Visual Panel.add(new Label("<HIGH>")); 
EnterPanel = new Panel(); 
EnterPanel.setLayout(new FlowLayout( FlowLayout. ae 
EnterButtort = new Button( "Press to Continue"); 
EnterButton.addActionListener(this ); 
EnterPanel.add(EnterButton); 
setLayout(new GridLayout(2, 1, 1, 3)); 
add( VisualPanel); 
add EnterPanel); 
pack(); 
setLocation( 180,220); 
addWindowListener(this ); 
thisScale = owner; 
}Hend 
public void windowClosed(WindowEvent event) { 


} 


~ public void windowClosing(WindowEvent event) { 
dispose(); 
System. 2c(); 


public void actionPerformed(ActionEvent event) { 
Object source = event.getSource(); 
uf (source == EnterButton) { 
thisScale.myReturn(); 
dispose(); 
System. gc(); 
}endif 
} end actionPerformed 
}4H end Frame 


Figure 36. Example of Java Frame used to Render Rating Scales. 
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F. SUMMARY 


In summary, this chapter has provided an overview of the overall experimental 
design process of this research effort to include its motivation, design considerations. 


eventual design selections. and overall software design. 
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V. VISUAL AND AUDITORY DISPLAY DEVELOPMENT 


A. INTRODUCTION 


Given that the pilot study is designed to investigate the perceptual effects from 
manipulating visual-display pixel resolution and auditory-display sampling frequency. 
the required associated visual and auditory displays need to be created. The visual display 
selected for the pilot study is a radio (Chapter IV, Figure 32), and the auditory display is 
a selection of music. The rationale for choosing a radio and music 1s based on the 
eventual coupling of the auditory and visual displays to form a combined auditory-visual 
display. Based on 1) psychological factors such as Gestalt perceptual grouping theory and 
the Ventriloquism Effect, and 2) neurological evidence supporting auditory-visual 
sensory interaction, an auditory-visual display consisting of a radio and music might be 
perceptually grouped together thereby producing a more tightly coupled display. 
Furthermore, in a higher cognitive sense, we are likely to associate music (audio) with a 
radio (visual). The ultimate goal is for the combined auditory-visual display to be 
experienced as a single entity, and not as separate auditory and visual displays. The 
following describes the development process of the visual, auditory, and combined 
auditory-visual displays used in the pilot study. This development process was 


instrumental in the eventual experimental design of the three main experiments. 


B. VISUAL-DISPLAY DEVELOPMENT 


To obtain the visual image of a radio, various techniques were utilized. First, a 
digital camera was used to take pictures of a radio in various settings (1.e. indoors and 
outdoors). However, the lighting and shadowing of these digital photos proved too 
difficult to manage properly. To eliminate lighting and shadowing problems, the next 
method involved using a flatbed scanner. The radio was simply placed on the scanner, 


while the scanner recorded the image of the radio. This method actually produced fairly 


good images, but there were still minor lighting and shadowing problems. Ultimately, a 
photograph of a radio was taken from the book Radios by hallicrafters with Price Guide 
by Chuck Dachis [DACH95]. This book contains many professionally photographed 
radios. After deliberating over the many pictures, a particular radio was finally chosen. 
This radio image was then digitized using a flatbed scanner at 600 x 600 pixel resolution. 
The color version of this radio 1s depicted earlier in Chapter IV, Figure 32. Since the 
visual displays of this experiment only involve the manipulation of pixel resolution, the 
overall color content (impression) of the image does not change much when changing 
pixel resolution. As a result, for the remaining discussion of this radio, all figures will be 
presented in black and white. However, it 1s important to emphasize that during the 
experiment, the visual displays of the radio were all presented in color. The black and 


white version of this radio at 600 x 600 pixel resolution is presented.in Figure 37. This 
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Figure 37. Visual Display of Radio at 600 





pixels/inch. 


particular radio was chosen because it contained many various features including: letters 
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and numbers, smooth and rough surfaces, strait and curved lines, patterns (on the 
speaker), and Reflections. The basis for having numerous features is to provide test 
subjects with a wide variety of cues from which to make their quality ratings. 
Incidentally, in an effort to avoid any potential copyright infringements, Chuck Dachis, 
the author of the book was contacted by telephone for the purpose of obtaining 
permission to use the photograph of the radio. Chuck Dachis gave his permission to use 
any photograph necessary for the experiments, and was very pleased that his 
photographic efforts were being used in scientific research. 

Using the original scanned image at 600 pixels/inch, Adobe Photoshop 
[ADOB98] was then used to make various copies with degraded pixel resolutions all 
having the same dimensions, the size of which nearly fills up the display area of a 20- 
inch computer monitor. Approximately 30 images of the radio ranging from 200 to 600 
pixels/inch were produced. The next step involved establishing levels of pixel resolution 
that were noticeably different, but not just-noticeably-different or obviously different. 
The goal was to establish low-, medium-, and high-quality visual displays for use in the 
experiment. An example that is obviously different 1s asking a subject to compare the 
quality between Figure 37 with Figure 38. As one can see, the difference is obvious, 
resulting in an inconsequential response from the subject. An example that is perhaps 
just-noticeably-different, is asking a subject to compare the quality between Figure 37 
and Figure 39. In this case, it 1s fairly difficult to distinguish the quality difference 
between the two radios. The basic idea is to create changes 1n pixel resolution that the 
subject can distinguish, but only with some effort. This process of establishing the 
noticeable levels of pixel resolution was very time consuming. Preliminary subjects were 
presented (using the same graphics accelerator and computer monitor chosen for the 
experiment as described later) about six or Seven images of the radio with varying levels 
of pixel resolution. A subject would then be asked to arrange (if possible) the images in 
ascending or descending order of quality. After repeating this process with about 15 


subjects, a consensus was finally reached which ultimately determined the low-, medium- 
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Figure 38. Obviously Different Poor-Quality Visual Display of Radio. 
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Figure 39. J ust-Noticeably-Different High-Quality Visual Display of Radio. 
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_ and high-quality visual displays of the radio to be used in the experiment. Resolutions of 
425 pixels/inch, 450 pixels/inch, and 500 pixels/inch were selected as the low-, medium-, 
and high-quality visual displays respectively to be used in the pilot study. In general, 
however, the actual (absolute) pixel resolution is not important, for there are numerous 
factors which affect the final rendering of the visual display such as: 1) computer monitor 
specifications, 2) computer monitor desk size (resolution), 3) video/graphics accelerator 
specifications, and 4) software application graphics-rendering capabilities. An example of 
this last factor, in terms of the pilot study, relates to the capability of rendering textured 
images via the CosmoPlayver VRML Plugin [COSM98] to Netscape Communicator 
[NETS98]. Since the visual displays were represented as textured images in 
CosmoPlayer, the displays had to be further processed (filtered) by CosmoPlayer. This 
resulted in noticeably degraded quality in the visual displays. This fact was well known 
ahead of time and was incorporated into the initial development of the low-, medium-, 
and high-quality visual displays. As a result, the only way to actually visualize the correct 
representations of the low-, medium-, and high-quality displays selected, is to view them 
through CosmoPlayer. However, because the pilot study implementation was eventually 
abandoned, it 1s not possible to adequately depict the visual displays as figures to view in 
this dissertation. Nevertheless, the important thing 1s that a relative quality ordering of the 
visual displays was established, for the intent of this research effort is to focus on the 
perceptual effects of various quality visual displays, and not on the absolute levels of — 
pixel resolution that determine these various quality displays. It is also important to note 
that even the high-quality visual display, has some, albeit slight, pixel resolution 
degradation. The reason for this is based on the design of the experiment. The goal is to 
have three noticeably different quality displays based on pixel resolution, and not to have 
one display with absolutely no perceivable pixel resolution degradation and two displays 
which do have pixel resolution degradation. If this were the case, the unwanted issue of 
absence or presence of noticeable pixel resolution 1s introduced. As such, subjects might 


be comparing the one display with no perceivable pixel resolution degradation to the two 
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displays which do have pixel resolution degradation. Thus, in order to ensure that 
subjects are making quality ratings based only on degree of pixel resolution (not absence 
or presence), the high-quality display must also have a small amount of perceivable pixel 


resolution degradation. 


C. AUDITORY-DISPLAY DEVELOPMENT 


Constructing the auditory displays was much easier than constructing the visual 
displays, since music can be obtained easily from any compact disc (CD). The only 
consideration was the musical content. Since the quality. parameter to be manipulated in 
the pilot study is sampling frequency, a conscious decision was made not to include 
vocals (speech). The reason for this is because the frequency range of speech is much less 
than that of typical musical instruments. For example, if the sampling frequency of music 
containing vocals 1s altered, the noticeable effect will be greater with the musical 
instruments than with the vocals. As such, if subjects focused on the vocals (which is 
fairly common), they might not be aware of any changes to the musical instruments. 
Therefore, choosing music without vocals eliminates the possibility of subjects focusing 
on the nonperceivable speech qualities. In terms of the type of music to use, choices 
considered were jazz, pop, rock, alternative, and classical. The consideration here is that 
if a subject is familiar with the music, the subject might have some preconceived 
expectations or might make unwanted comparisons from a previous listening experience 
to the auditory display that is to be evaluated. As such, to reduce the chance that subjects 
might have previously heard the music, an obscure portion of alternative music was 
selected. Another consideration in choosing the music was that the experimenter (myself) 
would have to listen to this piece of music for perhaps hundreds of times. So, the 
particular music selected was also very much liked by the experimenter (me). The music 
was taken from a song called A Forest from the CD Mixed up by The Cure which was 
produced by Elektra Entertainment Group, a division of Warner Communications Inc. In 


order to avoid any potential copyright infringements, a letter was written to Elektra 
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Records requesting to use portions of A Forest for scientific research. Elektra replied with 
an official letter granting permission to use portions of A Forest as long as a courtesy 
credit is given (see Figure 40). Thus, in accordance with Elektra’s stipulation, portions of 
A Forest by The Cure, courtesy of Elektra Entertainment Group, are used in the conduct 
of this experiment. (Thanks Elektra.) 

Using the Mixed up CD by The Cure, a 20 second selection of The Forest was 
recorded into Sonic Foundary’s SoundForge [SONI98] at 44.1 kHz (sampling 
frequency). The portion of music selected contained cymbals (among other instruments) 
resulting in a very wide frequency range of sound. SoundForge was then used to 
reproduce the 44.1 kHz 20-second musical] selection at numerous sampling frequencies 
ranging from 4 kHz to 44.1 kHz. Similar to creating the visual displays, the next step 
involved establishing sampling frequencies that were noticeably different, but not just- 
noticeably-different or obviously different. The goal was to establish low-, medium-, and 
high-quality auditory displays for use in the experiment. The basic idea 1s to create 
changes in sampling rate that the subject could distinguish, but only with some effort. 
This process of establishing noticeable sampling frequencies was again very time 
consuming. Preliminary subjects were presented (using the same audio card and 
headphones chosen for the experiment as described later) about six or seven music 
selections with varying sampling frequencies. These subjects were then asked to arrange 
(if possible) the musical selections in ascending or descending order of quality. After 
repeating this process with about 15 preliminary subjects, a consensus was finally 
reached which ultimately determined the low-, medium-, and high-quality auditory 
displays of music to be used in the experiment. Sampling rates of 11 kHz, 17 kHz, and 
44.1 kHz were eclecied as the low-, medium-, and high-quality auditory displays 
respectively for use in the pilot study. A consensus also established a constant volume 
setting for the auditory displays. Again, it is important to remember that the actual 


(absolute) sampling frequency is not important, for there are numerous factors which 
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February 13, 1998 
VIA FAX 408-656-2814 


Russell Storm 

Major, US Anny 

Dept. of Computer Science 
Naval Post Graduate School 
Monterey, Califomia 93943 


Re: “The Cure/“A Forest’ 
Gentleperson: 


This wilt confirm that Elektra Entertainment Group, a division of Wamer 
Communications inc. (Elektra’) has no objection to your use of partions of tha master 
recording “A Forest’ (the “Master’) performed by The Cure (“Artst’) solely for the 
purposes of a scentific expenmert in connection with your dissertation as described in 
the attached facsimile dated January 30, 1998. You shalt not distribute any copies of 
the Master. 


You acknowledge that as between you and Elektra, Elektra is the exclusive owner of all 
nights in and to the Master for the United States and Canada, and that you will not use 
the Master for any purpose other than that described above. You wil! be responsible for 
obtaining any other required consents and making all required payments, and you 
indemnify Elektra from any claims by third parties in connection with the foregoing. 


Yau will provide a courtesy credit as follows: “A Forast’ by The Cure courtesy of 
“Elektra Entertainment Group”. 


Please confirm you acceptance of the foregoing by signing in the space balow and 
returning this letter back to us. Your use of the Master sho} constitute such acceptance. 


affect the final rendering of any auditory display such as: |) how the original sound was 
produced, 2) audio card specifications, 3) rendering type (1.e., headphones or speakers), 
and 4) rendering type specifications. Nevertheless, as with the visual displays, the 
important thing 1s that a relative quality ordering of the auditory displays was established, 
for the intent of this research effort 1s to focus on the perceptual effects of various quality 
auditory displays. and not on the absolute sampling frequencies that determine these 
various quality displays. It is interesting to note that the high-quality auditory display, 
unlike the high-quality visual display, did not need to be slightly degraded in order to 
avoid the absence or presence degradation issue which was a concern with the visual 
displays. The reason for this is that our eyes are accustomed to a certain fidelity (quality), 
but our ears are not as discerning. This was readily apparent during the process of 
selecting the three auditory display qualities. When evaluating the various selections, not 
one subject could not distinguish between 44.1 kHz or 22.05 kHz, which could be 
attributed to the various factors involved in the final rendering of the auditory display, as 
discussed earlier. Nevertheless, in terms of the higher qualities, the ears were not as 
discerning when evaluating sampling frequency as the eyes were at evaluating pixel 


resolution. 


D. AUDITORY-VISUAL DISPLAY DEVELOPMENT 


After establishing the visual and auditory displays, the next step was to develop 
the combined auditory-visual displays. The consideration here is 1) determining how long 
to render the displays, and 2) synchronizing the rendering of both auditory and visual 
displays. In order to eliminate any potential confounds, the amount of time a subject 1s 
given to view or hear the displays when presented separately must be the same amount of 
time given to view/hear the combined auditory-visual displays. During the process of 
establishing both the auditory and visual low-, medium-, and high-quality displays, 
subjects were asked if they needed more or less time to view or hear the appropriate 


displays. Based on a consensus, seven seconds was chosen for both displays. 
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Intcrestingly, some subjects at first thought they necded more time (around 20 seconds), 
but when given more time, the subjects realized that they were changing their minds too 
often about the quality, and when it camc time to rate the quality of the display, they 
forgot what they were thinking. The subjccts then requested a shorter time duration. In a 
related experiment conducted to measure the scene-dependent quality variations in 
digitally coded television pictures, subjects were asked to assess distortions introduced by 
Motion Picture Expert Group-2 (MPEG) coding (see [MPEG98]). MPEG-2 sequences of 
10 and 30 seconds fength were used. One of the findings of this experiment was that the 
30 second sequences were too long. This finding supports previous evidence of the length 


of human working memory (WM). 


There is evidence to suggest that WM has a duration of about 20 s and that the rate of 
decay in WM is dependent on the amount of information presented, as it has a limited 
capacity. Both of these facets of memory can be seen as important in the results, in that 
the end of the sequences are more accessible to memory recall (the recency effect) and 
may bias the subjects overall rating. [PETE59} [WICK92] [ALDR95] 


Although the displays in the pilot study and main experiments are static, as opposed to 
motion video, the same concept of human WM applies. Therefore, based on subject 
consensus and human WM theory, all displays for the pilot study, whether presented 
separately or in combination, are presented to the subject for seven seconds. Having now 
established all required displays, the main design of the pilot study was ready to be 


developed. 


E. SUMMARY 


In summary, this chapter has provided an overview of the selection and 
development process of the auditory-only, visual-only,-and combined auditory-visual 


displays utilized in the experimental design of this research effort. 
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VI. PILOT STUDY 


A. INTRODUCTION 


The pilot study played a crucial role in this research effort. The lessons learned 
from the pilot study were essential to the development and use of appropriate auditory 
and visual displays and to the overall design of the three main experiments forming the 


foundation of this dissertation. 


B. LOCATION 


All experiment sessions of the pilot study were conducted in the same isolated 
room under the same ambient conditions. The dimensions of the room were 
approximately 10 feet x 20 feet. Before each session, 1) all nonessential electronic 
equipment was turned off, 2) telephones were unplugged, 3) windows were closed and 
covered with blackout cloth, 4) the main overhead lights were turned off, 5) a 60 watt 
incandescent desk lamp was turned on behind the computer monitor to eliminate any 
glare, 6) the door to the room was closed, 7) a Do Not Disturb Sign was placed on the 
outside of the door, and 8) the subject was asked to turn off any audible pagers, mobile 
phones, and/or watches. This last condition was only implemented by accident, after a 


subject’s beeper sounded during an experiment session. 
C. PARTICIPANTS 


A total of 22 volunteer participants (6 Female, 16 Male) comprised from the 
students, faculty, staff, and guests of NPS served as subjects ranging in age from 28 to 
62. Aljl subjects were required to have 20/20 or corrected 20/20 vision and normal 
hearing. Because the experiment did not involve precise measurements of pixel resolution 


or sampling frequency, a vision and hearing test were not needed. Nevertheless, before 
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conducting the experiment, each subject was asked, as part of a voluntary consent form, 


if he or she met the vision and hearing requirements. 


D. APPARATUS 


A Pentium 166 MHz personal computer with 64 MBytes main memory running 
Microsoft Windows NT 4.0 served as the main hardware platform of the pilot study. The 
low-, medium-, and high-quality auditory displays, described earlier, were generated by a 
Sound Blaster 16 PnP audio card [CREA98] and rendered via Sennheiser HD 540 
reference II headphones [SENN98]. The low-, medium-, and high-quality visual displays, 
described earlier, were generated by an Elsa Gloria-8 graphics accelerator card 
[ELSA98] and rendered viaa Sony Multiscan 20 inch sf II computer monitor [SONY98a] 
set at 800 x 600 resolution. The entire automated experiment was contained within a 
Netscape Communicator 4.05 HTML browser window [NETS98] using CosmoPlayer 2.0 
VRML plug-in [COSM98] to render the visual-only, auditory-only, and combined 
auditory-visual displays, and using Java pop-up windows developed using JDK 1.1.5 


(Java Development Kit) [SUNM98] to collect subject responses. 


E. PROCEDURE 


The experiment involved a 3x3 factorial within subjects design. The two 
‘independent variables were visual and audio display quality. The two dependent variables 
were the corresponding quality perception of the auditory and visual displays. The three 
levels of the visual quality independent variable consisted of low-, medium-, and high- 
quality visual displays of the radio image depicted earlier in Chapter [V, Figure 32 
having resolutions of 425 pixels/inch, 450 pixels/inch, and 500 pixels/inch respectively. 
The three levels of the auditory quality independent variable consisted of low-, medium-, 
and high-quality auditory displays of the same music selection having sampling rates of 
11 kHz, 17 kHz, and 44.1 kHz respectively. As such, the visual display parameters 


manipulated were pixel resolution, and the auditory display parameters manipulated were 
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sampling frequency. During each experiment, which lasts approximately 30 minutes. 
each subject wears headphones and sits in front of a 20-inch computer display monitor. 
The task of the subject was to rate the perceived quality of audio-only, visual-only. and 
audio-visual displays via rating scales as either low-, medium-, or high-quality. 

After reading a brief experimental overview and signing a voluntary consent 
form, the subject was seated in a chair facing the computer monitor. The subject was 
instructed to adjust the seat height and/or monitor orientation to that which was most 
comfortable and which represented their typical computer monitor viewing habit. 
Although a standard viewing position/orientation is much desired in experimental design, 
the focus of this experiment was not on precision, but rather perception. Accordingly, the 
idea was for subjects to be 1) relaxed, 2) comfortable, 3) and in their typical viewing 
position/orientation. Nevertheless, no subject sat closer that about one foot or further than 
about three feet from the monitor. The subjects were instructed on how to wear and fit the 
headphones, and also how to adjust the volume if necessary. In order to maintain 
identical testing conditions, it was hoped that no one would need to adjust the previously 
established headset volume. If a subject did adjust the headset volume, that subject’s data 
would not be included in the final data analysis. However, no subject needed to adjust the 
headset volume. 

Once the subject was seated and wearing the headphones. an automated computer 
program contained within an HTML browser window instructed the subject to enter some 
personal data information as depicted in Figure 41. This persona] data was used to create 
a unique data file to collect the specific subject’s data for the remainder of the 
experiment. The file created is a .csv (comma separated variable) file which can easily be 
imported into Microsoft Excel. This was the only time for which the keyboard was 
utilized. For the remainder of the experiment, only the mouse was needed. The automated 
experiment continues by presenting the subject with a series of instructions giving full 
explanation of what is and is not required of the subject. The visual-only, auditory-only, 


and combined auditory-visual displays were rendered via VRML, and Java pop-up 
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Figure 41. Pilot Study: Initial Data Input Screen. 


windows collected subject responses. The primary reason for using VRML is for the 
eventual goal of manipulating auditory and visual displays in 3D scenes. Even though 
only static visual displays are currently used, the idea was to develop the foundation of 
the experiment using VRML to facilitate an easy transition to full 3D scenes. Other 
considerations for using VRML are as follows 1) it is freely downloadable, 2) it is easy to 
use, 3) it has a very short learning curve, and 4) it is new technology worth investigating. 
As the automated experiment continues, the first set of instructions presented to 
the subject is depicted in Figure 42. The idea 1s for the subject to memorize the quality 
differences among the three displays. The same process was repeated again to give the 
subject yet another chance to review and memorize the three quality levels. Next, the 


Subject is instructed how to rate the visual-only displays as depicted in Figure 43. After 
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(1) You will now see a sequence of three different visual displays. 
First, a LOWY quality visual display will be shown for ? seconds. 
Second, 4a MEDIUM quality visual display will be shown for ? seconds. 
Third, a HIGH quality visual display will be snown for 7 seconds. 


(2) No response ts required from you at this time. 
(3) Later inthis experiment, you will be tested on your ability to correctly 
identity which visual display is LOW, MED, or HIGH quality. 


Therefore, at this time you should try your best to memorize 
any differences among the LOW, MED and HIGH quality visual displays. 


Press to Cantinue 
Ss Signed by: Unsigned classes fram local hard disk 


Figure 42. Pilot Study: Visual-Only Familiarization Instructions. 
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(1) You will now be rating the quality of the visual displays which you have just seen. 


(2) Atotal of nine visual displays will be presented randomly. 
(3) You will have ? seconds to see each visual display. 


(4) After seeing the visual display, you will be prompted for your rating. 


Press to Continue | 
SS Signed by: Unsigned classes frora local hard disk 


Figure 43. Pilot Study: Visual-Only Rating Instructions. 


the seven seconds for which each visual display is rendered, the visual display 
automatically disappears, and a Java pop-up window automatically appears to facilitate 
the visual display rating as depicted in Figure 44. The subject rates a total of nine visual- 
only displays (three of each quality, low, medium, and high, presented in random order). 


After rating the visual-only displays, the subject uses the exact same process to rate nine 
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NN. Visual Display Quality Rating Scale 
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rS [Signed by: Unsigned classes from local hard disk 
Figure 44. Pilot Study: Visual Display Rating Scale. 


auditory-only displays (three of each quality presented in random order) by using the 


auditory rating scales as depicted in Figure 45. After rating the auditory displays, the 
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Figure 45. Pilot Study: Auditory Display Rating Scale. 


subject 1s presented with instructions on how to rate the combined auditory-visual 
displays as depicted in Figure 46. After each of the 18 combined auditory-visual displays 
is presented (the nine permutations of the auditory and visual qualities are partially 
counterbalanced through the Latin squares technique, and then presented 1n reverse order 
for a total of 18 combined auditory-visual ratings), the subject rates both the auditory and 
visual displays using the combined auditory-visual rating scale depicted in Figure 47. 
After the subject has completed rating all of the displays, the automated portion of the 
experiment terminates. The subject is then asked to complete a brief post-experiment 
survey consisting of 13 questions as depicted in Figure 48 and Figure 49. After 
completing the post-experiment questions, the subject is allowed to ask any overall 
questions about the experiment. The experiment is then terminated, and the subject is free 


to go. 
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(1) You will now be presented a sequence of 18 various combined visual and auditory displays. 


(2) These displays consist of the same visual and auditory displays which you have just 
rated with the same LOWY, MEDIUM, and HIGH qualities. However, the visual and 
auditory displays will now be presented simultaneously. As a result, you might be 
presented a high quality visual display along with a low quality auditory display, 
and vice versa. Or you might be presented a high quality visual display along with 
a high quality auditory display etc, etc, ... 


(3) Each combined visual and auditory display will be presented randomly for ? seconds. 
(4) After each combined visual and auditory display, you will be tested on your ability to 


correctly identify whether the visual display is LOW, MED, or HIGH quality, 
and whether the auditory display is LOW, MED, or HIGH quality. 


Press to Continue | 
n> Signed by: Unsigned classes from local hard disk 


Figure 46. Pilot Study: Combined Auditory- Visual Rating Instructions. 








HL Visual and Auditory Display Quality Rating Scales 


Visual Display Qualtiy Rating ---> © Low © Med © High  <--- Visual 


Auditory Display Quality Rating---> © Low © Med © High <--- Auditory 
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Figure 47. Pilot Study: Combined Audifory- Visual Rating Scale. 


F. RESULTS AND DISCUSSION 


The results of the pilot study proved invaluable and led to a completely 
redesigned experiment. Software and hardware problems, procedural problems, as well as 


validating some experimental design criteria were identified and are discussed below. 
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Post Experiment Questions 









For the following questions, circle the whole number that best represents your response. 


Circhng number 4 means you are incifferenl ahout the question. Use only whole aumbers 1 






through 7. Do not use fractions. 




















' J. How easy or difficult was it to detenmne the quality of the visual only displays? 






very easy- I 2 > 4 S 6 a -very hard 
2. How easy or difficult was it to determine the quality of the auditory only displays? 
very €asy- 1 2 3 - 5 6 q -very hard 
3. low easy or difficult was it to determine the quality of the auditory-visual displays’? 
very easy- J 2 3 4 = 6 7 -very hard 
4. Would you have liked Jess or more Ume to view the visual only displays? 
luss time- ] 2 3 4 s 6 7 -more time 
3. Would you have liked less or more Une to hear the auditory only displays? 
less imc- 1 D5 3 4 5 6 7 -more tune 
6. Would you have hked less or more time to hear-see the auditory-visual displays? 
less ume- 1 2 3 4 5 6 7 -more time 
| 
| 
7. Time wise, was the overall expemment too short or too long? 
100 short- 1 Z 3 4 a 6 7 -too long 





Was die eiperinent ncuily cxhausung or not? 












Not yery- I 2 3 “ 5 6 7 -yes very 


Auditory- Visual Cross-Moda! Experunent (Phase 1) 5 Last Name: 


Subject and Sequence Number: 
Die: 


Figure 48. Pilot Study: Post-Experiment Questions 1 - 8. 
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For the following questions, circle yes or no and/or make appropriate comments if applicabie. 


Did you direct your attention to any specific features of the visual display when determining 
the quality of the visual display? No Yes 
If applicable please explain: 


. Did you direct your attention to any specific teatures of the auditory display when 
determining We quality of the auditory display? No Yes 


If applicable please explain: 


. Were you ever mentally overloaded dunng any part of the experrment? No Yes 


If applicable please explain: 


f 


. Have you participated in an experiment similar to thisone? No Yes 


Lf applicable please explain: 


. Any other comments about what you liked or didn’t like, or things that should be changed 


during the course of this experiment? 


Auditory- Visual Cross-Modal Expenment (Phasc }} Last Name: 


Subject and Sequence Numher- 


Date: 


_—~ 





Figure 49. Pilot Study: Post-Experiment Questions 9 - 13. 
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1. Software and Hardware Problems 


Perhaps the biggest problem of the pilot study was that the software and hardware 
utilized proved to be unstable. A computer hardware problem, which was never isolated, 
causcd four complete system crashes, resulting in the need to completely reload Windows 
NT and all experiment software applications. This hardware problem caused the loss of 
valuable time of the subjcct as well as the experimenter not to mention the loss of the 
irreplaceable collected data. Furthermore, the Windows NT operating system crashed on 
numerous occasions during pilot study development and also during experiment sessions, 
again causing a considerable loss of valuable time and data. The use of VRML also 
caused unpredictable system crashes. This problem seemed to occur during Java-VRML 
intercommunication, and was evident by receiving the Microsoft Visual C++ Runtime 
Library error number R6025: Pure Virtual Function Call. Having tried numerous 
possible fixes, this unpredictable error remained. Another problem associated with 
VRML was synchronizing the combined auditory-visual displays. The reason for this is 
because the synchronization was based on the specifications of the particular audio and 
video hardware utilized. As a result, the synchronization of the displays could only be 
done through trial and error which was very time consuming. Furthermore, this limits the 
portability aspect of the experiment which 1s turn severely precludes the possibility of 
conducting future on-line experiments. Ultimately, because of the unreliable nature of the 
software and hardware, the pilot study was terminated before collecting the required 
number of data points to warrant proper data analysis. However, the results of the 13 
subjects who successfully completed the experiment without any system crashes suggest 
that further examination of auditory-visual cross-modal perception phenomena 1s 


warranted. These results are discussed later. 


2. Procedural Problems 


Identifying experimental design procedural errors was another very important 


contribution of this pilot study. The main procedural] errors identified were: visibility of 
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Netscape’s status window, rating scales default setting, time delay between ratings, 


narrow range of rating scales, and memorization versus perception measurement. 


a. Netscape Status Window 

After asking one of the test subjects about the difficulty of the experiment, 
the subject said that it was not too hard to rate the quality of the displays, for he was 
simply looking at Netscape’s status window while the displays were being loaded. He 
figured correctly, that the larger the file size, the better the quality. Thus, he simply 
looked at the status window, as opposed to the displays, resulting in very accurate 
responses. The immediate correction to this problem was to cover the status bar with a 
piece of black cloth. Ultimately it was discovered that the key sequence ctrl-alt-s toggles 


the appearance of Netscape’s status window. 


b. Rating Scales Default Setting 


Unbeknownst to the subject, the subject’s response time to rate the various 
displays was being measured. Upon analyzing the response time data, the response time 
to rate the medium-quality for the auditory-only, visual-only, and combined auditory- 
visual displays were significantly lower than that of the high- or low-quality displays. In 
analyzing why this might be, it became apparent that the reason was because the 
medium-quality choice was the default radio button setting on all the rating scales as 


depicted in Figure 50. As a result, if the subject were to make a medium-quality choice, 





Visual Display Quality Rating Scale 







Visual Display Qualtiy Rating---> © Low €@ Med? € High 


~ Press ta Cantinue | 


Unsigned classes frorn local hard disk 






">> [Signed by: 
Figure 50. Pilot Study: Default Visual Quality Rating Scale. 





the subject need only click the Press to Continue button on the rating scale. For the low- 


and high-quality choices, the subject had to select the appropriate radio button and then 
click the Press to Continue button on the rating scale which takes longer time. This 
problem was corrected by removing the medium-quality default choice as depicted earlier 


in Figure 44. 


c. Time Delay Between Ratings 


Because of how VRML was implemented 1n the experimental design, 
there was a noticeable time delay associated with the loading and unloading of the 
VRML Plug-in to Netscape. Many subjects complained that this time delay caused them 
to lose perspective on the relative quality ordering of the displays. Subjects wanted a 
faster turn-around time between quality ratings. A possible correction to this problem 1s 
to redesign VRML’s use so that its plug-in 1s only loaded once at experiment start-up. 
However, compounded with the previous problems associated with VRML, the main 


experiments were redesigned without 3D VRM.L, resulting in 2D HTML displays. 


d. Narrow Range of Rating Scales 


Because of the experimental design, the range of the rating scales 1s small 
having only three possible values: low, med, high. This small range introduces unwanted 
floor and ceiling effects. For example, if a high-quality rating is not selected, for 
whatever reason, the only possible choices remaining are medium- and low-quality. 
Likewise, if a low-quality rating is not selected, for whatever reason, the only possible 
choices remaining are medium- and high-quality. As a result, this three-choice rating 
scale introduces unwanted floor and ceiling effects which in turn reduces the ability to 
properly measure any degrees of perceptual effects caused by the various quality 
displays. In terms of the goal of this research effort, using a three-choice rating scale 
severely hampers supporting data analyses. The correction to this problem is addressed 


later. 
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€. Memorization Versus Perception Measurement 


The biggest procedural error was in the overall experimental design. This 
error stems from the basis by which subjects make their quality ratings. The question is 
one of measurement. Given that the task of a subject was to memorize the three auditory 
and visual display qualities, subjects responses were more likely based on their ability to 
memorize the given quality differences as opposed to perceiving potential changes in 
display qualities. Thus, the experiment becomes more of a matching problem as opposed 
to measuring perceptual phenomena. Because of this potential error, the experiment was 


completely redesigned as described in the next chapter. 


3. Validated Design Criteria 


Several positive outcomes resulted from the pilot study. In analyzing the post- 
experiment surveys, a seven-second duration of visual-only, auditory-only, and combined 
auditory-visual displays proved desirable and adequate. The subjects’ approval also 
validated the overall length of the experiment, which typically lasted around 30 minutes. 
Furthermore, the responses of the subjects also suggested that with some effort, all the 
displays were noticeably different. This finding was very important for it validated the 
subjective relative quality ordering of the displays, which in turn validated the technique 


used to develop the various quality levels of the displays. 


‘G. SUMMARY AND CONCLUSIONS 


Because of the many experimental procedure errors identified during the pilot 
study, a valid data analysis of the results is not possible nor desired. Nevertheless, a few 
points are worth mentioning. In terms of memorization (the matching problem), the 
subjects were better able to correctly identify the quality levels of the visual-only and 
auditory-only displays, as opposed to correctly identifying the quality levels of the visual 
and auditory displays when presented in combination. Some subjects were better than 


others at identifying correct quality levels. In post-hoc analyses, there also appeared to be 
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gender differences in 1denufying correct quality levels as well as differences in response 
times. Overall, the results of the prlot study indicate that there are differences in the 
subjects’ ability to correctly match auditory-only, visual-only, and combined auditory- 
visual displays, and that gender may play a factor in correctly rdentifying the various 
displays. In the final analysis. the results of the pilot study greatly facilitated a new and 
improved experimental design ultimately supporting the goal of this research effort to 


investigate auditory-visual cross-modal perception phenomena. 
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VII. EXPERIMENT 1: STATIC RESOLUTION 


A. INTFRODUCTION 


Experiment |: Static Resolution investigates the perceptual effects from 
manipulating visual display pixel resolution and auditory display sampling frequency. 
The visual display consists of a static image of a radio depicted earlier in Chapter IV, 
Figure 32, and the auditory display is a selection of music. Specifically, the goal of this 
experiment is to answer the following questions: 


1) Does a high-quality auditory display coupled with a low-quality visual display 
cause a decrease/increase in the perception of audio quality and/or an increase/decrease in 
the perception of visual quality relative to established baseline conditions derived from 
auditory-only and visual-only quality perception evaluations? 


2) Does a low-quality auditory display coupled with a high-quality visual display 
cause an increase/decrease 1n the perception of audio quality and/or a decrease/increase in 
the perception of visual quality relative to established baseline conditions derived from 
auditory-only and visual-only quality perception evaluations? 


3) Does a low-quality auditory display coupled with a low-quality visual display 
cause a decrease/increase in the perception of audio quality and/or a decrease/increase in 
the perception of visual quality relative to established baseline conditions derived from 
auditory-only and visual-only quality perception evaluations? 


4) Does a high-quality auditory display coupled with a high-quality visual display 
_ cause an increase/decrease in the perception of audio quality and/or an increase/decrease 
in the perception of visual quality relative to established baseline conditions derived from 
auditory-only and visual-only quality perception evaluations? 


B. LOCATION 


All sessions of Experiment |: Static Resolution were conducted in the same 
isolated room under the same ambient conditions. The dimensions of the room were 
approximately 10 feet x 20 feet. Before each session, |) all nonessential electronic 
equipment was turned off, 2) telephones were unplugged, 3) windows were closed and 


covered with blackout cloth, 4) the main overhead lights were turned off, 5) a 60 watt 
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incandescent desk lamp was turned on behind the computer monitor to eliminate any 
glare, 6) the door to the room was closed, 7) a Do Not Disturb Sign was placed on the 
outside of the door. and 8) the subject was asked to turn off any audible pagers, mobile 


phones, and/or watches. 


C. PARTICIPANTS 


A total of 36 volunteer participants (18 Female, 18 Male) comprised from the 
students, faculty, staff, and guests of NPS served as subjects. Based on the preliminary 
findings of the pilot study, the number of male and female subjects in this experiment is 
balanced. The average age of the subjects 1s 36.5 years ranging in age from [5 to 63 (two 
female subjects did not give their age). All subjects were required to have 20/20 or 
corrected 20/20 vision and normal hearing. Because the experiment did not involve 
precise measurements of pixel resolution or sampling frequency, a vision and hearing test 
were not needed. Before conducting the experiment, each subject was asked, as part of a 


voluntary consent form, if he or she met the vision and hearing requirements. 
D. APPARATUS 


A Pentium 200 MHz (MMX) personal computer with 64 MBytes main memory 
running Microsoft Windows 95 served as the main hardware platform of the experiment. 
The auditory displays are generated by a Sound Blaster 64 AWE Gold audio card 
[CREA98] and rendered via Sennheiser HD 540 reference II] headphones [SENN98]. The 
visual displays are generated by a Diamond Multimedia Viper V330 128 bit graphics 
accelerator card [DIAM98] and rendered via a Sony Multiscan 20-inch sf.I] computer 
monitor [SONY98a] set at 800 x 600 resolution. The entire automated experiment is 
contained within a Netscape Communicator 4.05 HTML browser window [NETS98] 
using JavaScript to render the visual-only, auditory-only, and combined auditory-visual 
displays. Java pop-up windows, developed using JDK 1.1.5 [SUNM98], were used to 


collect subject responses. 
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E. PROCEDURE 


The experiment involved a 3x3 factorial within subjects design. The two 
independent variables are visual and audio display quality. The two dependent variables 
are the corresponding quality perception of the auditory and visual displays. The three 
levels of the visual quality independent variable consist of low-, medium-, and high- 
quality visual displays of the radio image depicted earlier in Chapter IV, Figure 32 
having resolutions of 350 pixels/inch, 450 pixels/inch, and 550 pixels/inch, respectively. 
The three levels of the auditory quality independent variable consist of low-, medium-, 
and high-quality auditory displays of the same music selection presented monophonically 
having sampling rates of 11 kHz, 23 kHz, and 35 kHz, respectively. As such, the visual 
display parameters manipulated are pixel resolution, and the auditory display parameters 
manipulated are sampling frequency. During the experiment which lasts approximately 
30 minutes, each subject wears headphones and sits in front of a 20-inch computer 
display monitor. The task of the subject 1s to rate the perceived quality of auditory-only, 
visual-only, and auditory-visual displays via Likert rating scales ranging from 1 (low) to 
7 (high). 

After reading a brief experimental overview and signing a voluntary consent 
form, the subject is seated in a chair facing the computer monitor. The subject is 
instructed to adjust the seat height and/or monitor orientation to that which was most 
comfortable and which represents their typical computer monitor viewing habit. 
Although a standard viewing position/orientation 1s much desired in experimental design, 
the focus of this experiment is not on precision, but rather perception. Accordingly, the 
idea was for subjects to be 1) relaxed, 2) comfortable. 3) and in their typical viewing 
position/orientation. Nevertheless. no subject sat closer that about one foot or further than 
about three feet from the computer monitor. The subjects are instructed on how to wear 


and fit the headphones, and also how to adjust the volume if necessary. In order to 
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maintain identical testing conditions, it was hoped that no one would need to adjust the 
headset volume. No subject needed to adjust the headset volume. 

Once the subject is seated and wearing the headphones, an automated computer 
program contained within an HTML browser window instructs the subject to enter some 


personal data information as depicted in Figure 51. (Note that Netscape’s status window 





3 An Experiment - Netscape 
Fie Edt View Go Communicator Help 
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Before starting the experiment, please enter the following information about yourself 


LastName First Name | Middle Initial 
Sex (type M Or F): | Age | Occupation: | 


Subject and Sequence Number (i.e. 11, 21, etc.) | 


Press to Enter Your Data | 


For the experiment to work properly, you must press to enter your data before continuing with the expenment. 


Click here to continue with the experiment. 





Figure 51. Experiment 1: Data Input Screen. 


is not visible at the bottom of the screen as compared with that of the pilot study depicted 
earlier in Chapter VI, Figure 41.) This personal data is used to create a unique data file to 
collect the specific subject’s data for the remainder of the experiment. The file created 1s 
a .csv (comma Separated variable) file which can easily be imported into Microsoft Excel. 
This is the only time for which the keyboard was utilized. For the remainder of the 


experiment, only the mouse is needed. The automated experiment continues by 


el 


You will now be presented two Visual Displays. 

One display is of ‘Low Quality’ and the other is of 'High Quality’. 

To see the 'Low Quality’ display, click on the 'LOW QUALITY’ link. 

To see the 'High Quality’ display, click on the 'HIGH QUALITY’ link. 

You Can view either display as long as you like. 

You Can go back and forth between the displays as many times as you like. 


Later in this experiment, you will be tested on your aoility to correctly 

identify various quality levels of visual displays. Therefore, at this time 

you should try your best to memorize what is considered to be a 'Low Quality’ display, 
and what Is Considered to be a 'High Quality’ display. When you are ready to 

begin rating the quality of visual displays, click on the 'FINISHED' link. 


Press to Continue | 


Figure 52. Experiment 1: Visual Display Instructions. 





presenting the subject with a series of instructions giving full explanation of what is and 
is not required of the subject. The visual-only, auditory-only, and combined auditory- 
visual displays are rendered via JavaScript, and Java pop-up windows collects subject 
responses. | 

As the automated experiment continues, the subject is first presented with a series 
of instructions, displays, and rating scales in order to 1) ensure the headphones are 
working properly, 2) familiarize the subject with how the visual displays will be 
presented on the computer monitor, and 3) familiarize the subject with what the rating 
scales look like, how they will appear and disappear automatically, and how to use them. 
After this familiarization process, the first set of instructions presented to the subject is 
depicted in Figure 52. The idea is for the subject to memorize the quality differences 
between the lowest and highest quality visual displays. As a result, the subject calibrates 
himself or herself to the maximum possible quality range spanned by the low- and high- 
quality extremes. During this process, the subject has direct control] in viewing the low- 
and high-quality displays simply by clicking on either the LOW QUALITY or HIGH 
QUALITY hypertext link. Figure 53 depicts the appearance of the low-quality visual 
display having 250 pixels/inch and Figure 54 depicts the appearance of the high-quality 


visual display having 600 pixels/inch. Note, that the original displays were depicted in 
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Figure 54. Experiment 1: High-Quality Visual Display Familiarization. 
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color, and that the actual pixel resolution experienced by the subject can only be viewed 
on the actual 20 inch computer monitor. However, the low- and high-quality displays 
depicted in Figure 53 and Figure 54 are fairly good representations of the quality 
difference between the actual displays used in the experiment. When the subject is ready 
to begin rating the visual displays, he or she clicks on the FINISHED hypertext link. The 


subject is then presented with the instructions depicted in Figure 55. When ready, each 


You will now be rating the quality of visual displays. 


Base your ratings on the Low and High visual displays depicted earlier. 


For example, if the visual display you are rating appears to look 

like that of the previously shown Low quality display, your rating 
should be '1' for ‘Low’. If the visual display you are rating appears 
to be of better quality than that of the previously shown Low quality 
display, your rating should be somewhere in the range from '2' to '7' 


ie 
A total of 9 visual displays will be presented randomly. 
You will have 6 seconds to see each visual display. 


After seeing the visual display, you will be prompted for your rating. 


{ 
Press to Continue | 


Figure 55. Experiment 1: Visual Display Rating Instructions. 
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Figure 56. Experiment 1: Visual Display Quality Rating Scale. 





visual display is rendered for eight seconds after which it automatically disappears, and a 
Java pop-up window automatically appears to facilitate rating the visual display as 
depicted in Figure 56. The subject rates a total of nine visual-only displays (three of each 


quality, low, medium, and high presented in random order). After rating the visual-only 


MS 


displays, the subject uses the same process, as with the visual displays, to memorize the 
quality differences between the lowest and highest quality auditory displays. The lowest 
and highest quality auditory displays corresponded to 8 kHz and 44.1 kHz respectively. 
The subject uses the exact same process, as with the visual displays, to rate nine auditory- 
only displays (three of each quality presented in random order) by using the auditory 


rating scales as depicted in Figure 57. After rating the auditory displays, the subject is 
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Figure 57. Experiment 1: Auditory Display Quality Rating Scale. 


presented with instructions on rating only the visual quality of nine combined auditory- 
visual displays (the nine permutations of the auditory and visual qualities are partially 
counterbalanced through the Latin squares technique) as depicted in Figure 58. The 
subject is then presented with instructions on rating only the auditory quality of nine 
combined auditory-visual displays (the nine permutations of the auditory and visual 
qualities are partially counterbalanced through the Latin squares technique) as depicted in 
Figure 59. Finally, the subject is presented with instructions on rating 18 combined 
auditory-visual displays as depicted in Figure 60. After each of the 18 combined 
auditory-visual displays is presented (the nine permutations of the auditory and visual 
qualities are partially counterbalanced through the Latin squares technique, and then 
presented in reverse order for a total of 18 combined auditory-visual ratings), the subject 
rates both the auditory and visual displays using the combined auditory-visual rating 
scale depicted in Figure 61. After the subject has completed rating all of the displays, the 
automated portion of the experiment terminates. The subject is then asked to complete a 


brief post-experiment survey consisting of 13 questions. This survey 1s identical to the 
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(1) You will now be rating the VISUAL quality of a combined audio-visual display. 


(2) A total of 9 audio-visual displays will be presented randomly. 
(3) Each audio-visual display will be presented for 8 seconds. 


(4) After which, you will be prompted ONLY for your VISUAL rating. 


Press to Continue | 


Figure 58. Experiment 1: Visual-Only Rating Instructions When Given A 
Combined Auditory-Visual Display. 





(1) You will now be rating the AUDIO quality of a combined audio-visual display. 
(2) A total of 9 audio-visual displays will be presented randomly. 
(3) Each audio-visual display will be presented for 6 seconds. 


(4) After which; you will be prompted ONLY for your AUDIO rating. 


Press ta Continue | 


Figure 59. Experiment 1: Auditory-Only Rating Instructions When 
Given A Combined Auditory-Visual Display. 





(1) You will now be rating the audio AND visual quality of a combined audio-visual display. 
(2) Atotal of 18 audio-visual displays will be presented randomly. 


(3) Each audio-visual display will be presented for & secends. 


(4) After which, you will be prompted for your audio AND visual rating. 


Press to Continue | 


Figure 60. Experiment 1: Combined Auditory- Visual Rating Instructions. 
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Figure 61. Experiment 1: Combined Auditory- Visual Rating Scale. 





one used in the pilot study as depicted earlier in Chapter VI, Figure 48 and Figure 49. 
After completing the post-experiment questions, the subject is allowed to ask any overall 
questions about the experiment. The experiment is then terminated, and the subject is free 


to go. 


F. CHANGES FROM PILOT STUDY 


The following discussion describes how the results from the pilot study were 
implemented in the redesign of this experiment and how these implemented results 


affected the overall execution of the main experiment. 


1. Software and Hardware Functionality 

Switching to a new hardware platform proved to be extremely reliable and never 
exhibited any problems. Switching to Microsoft Windows9)5 also proved to be very 
reliable since the operating system never once crashed. Eliminating the use of VRML 
also eliminated the system crashes associated with the Microsoft Visual C++ Runtime 
Library error number R6025: Pure Virtual Function Call. Furthermore, by using 
JavaScript as opposed to VRML, the combined auditory-visyal displays were 
automatically synchronized when being rendered. This eliminated the trial and error 


process associated with VRML ultimately saving a lot of time and effort during the 


118 


development of the main experiment, and thereby better supporting the portability aspect 


of the experiment for the eventual goal of conducting future on-line experiments. 
2. Procedural Changes 


a. Netscape Status Window 


The use of the black cloth to cover Netscape’s Status Window on the 
computer monitor was negated by learning the ability to use the key sequence ctrl-alt-s to 
toggle the on and off the Status Window. This not only increased the professionalism of 


the experiment, but also, albeit small, increased the size of the viewing display area. 


b. Rating Scales Default Setting 


By eliminating any default setting on the rating scales, the subject’s 
response time measurement became uniform across all possible ratings, thereby allowing 


proper data analysis of response time. 


c. Time Delay Between Ratings 


By eliminating the use of VRML, the time required to load and unload the 
VRML Plug-in was likewise negated. As a result, through the use of JavaScript, there 
was practically no perceivable time delay between ratings. Given that the time between 
ratings was now instantaneous, the overall amount of time to complete the experiment 
was significantly reduced. This facilitated adding additional data collection aspects to the 
experimental design, while not increasing the overall duration of the experiment. As with 


the pilot study, subjects completed the experiment in about 30 minutes. 


d. Range of Rating Scales 

Given that the range of all rating scales was increased from three to seven 
choices. the floor and ceiling effects were significantly reduced if not altogether 
eliminated. This increased range provides the ability to properly measure any potential 


degrees of perceptual effects caused by the various quality displays. 
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e. Elimination of the Matching Problem 

The matching (memorization) problem of the pilot study was eliminated 
by not requiring the subjects to memorize the three low, medium, and high display 
qualities. In this experiment, the subject is only required to memorize the lowest and 
highest possible quality extremes. During the rating process, the subject is never 
rccxposed to the lowest and highest quality displays. Furthermore, the subject is not 
aware of how many quality levels are actually being presented. Since there are seven 
possible choices on the rating scales, not three, the subject can only guess that there may 
be upwards of seven possible quality levels for both the auditory and visual displays. By 
only requiring the subject to memorize the lowest and highest possible quality extremes, 
each subject, in essence, self-calibrates himself or herself, when rating the quality 
displays that fall between the given lowest and highest qualities. In fact, unbeknownst to 
the subject, only three quality levels: low, medium, and high, are presented. Thus, when 
rating the various auditory and visual displays, the rating process becomes purely 
subjective (perceptual) and not based on memorizing the exact quality level of a 


particular display. 


f. Duration of Displays 

During the pilot study, all displays were rendered for seven seconds, 
however, in this experiment all displays were rendered for eight seconds. The reason for 
increasing the length of the displays by one second had to do with the auditory display 
development for the follow-on experiment, Experiment 2: Static Noise. In this 
experiment, which 1s described in the next chapter, Gaussian white noise level is the 
manipulated auditory display parameter. As such, a one half second fade-in and fade-out 
of Gaussian white noise was added to the auditory display to negate the abrupt onset of 
the rendered Gaussian white noise which is somewhat shocking and startling if 


unexpected. This startling effect might cause subjects to become uneasy or unnerved. 
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Thus, to maintain consistency of display duration among all experiments, all displays 


among the experiments were rendered for eight seconds. 


G. DATA COLLECTION AND ANALYSIS 


Before the results of the experiment are discussed, it 1s important to understand 


the nature of the data collected and the chosen method of data analysis. 


1. Data Collection 


To better understand the method of data analysis, it 1s first necessary to 
understand the method of data collection. The idea of the experiment was to first capture 
the subject’s quality perception of the visual-only and auditory-only displays. During this 
initial portion of the experiment, subjects rate nine displays consisting of three low, three 
medium, and three high qualities presented in random order. The average rated value for 
each quality display establishes the subject’s baseline quality rating for each low-, 
medium-, and high-quality display. This baseline quality rating can then be compared to 
other all future quality ratings. 

During the next portion of the experiment, subjects rate only the visual display 
quality of acombined auditory-visual display. The subject is presented nine combined 
auditory-visual displays corresponding to the nine permutations formed by the three 
auditory and three visual display qualities. The ordering of these nine displays is partially 
counterbalanced through the Latin squares technique. As such, the subject again rates the 
three low, three medium, and three high qualities of the visual displays. The average 
rated value for each quality display establishes the subject’s visual quality rating for each 
low-, medium-, and high-quality display when presented in combination with the three 
quality levels of the auditory displays. 

During the next portion of the experiment, subjects rate only the auditory display 
quality of a combined auditory-visual display. The subject is presented nine combined 


auditory-visual displays corresponding to the nine permutations formed by the three 
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auditory and three visual display qualities. The ordering of these nine displays is again 
partially counterbalanced through the Latin squares technique. As such, the subject again 
rates the three low, three medium, and three high qualities of the auditory displays. The 
average rated value for each quality display establishes the subject’s auditory quality 
rating for each low-, medium-, and high-quality display when presented in combination 
with the three quality levels of the visual displays. 

During the final portion of the experiment, subjects rate both the auditory and 
visual display qualities of a combined auditory-visual display. The subject is presented 18 
combined auditory-visual displays corresponding to |) the nine permutations formed by 
the three auditory and three visual display qualities and 2) the reversal of the nine 
permutations formed by the three auditory and three visual display qualities all of which 
is again partially counterbalanced through the Latin squares technique. As such, the 
subject rates, yet again, the three low, three medium, and three high qualities of the visual 
displays and the auditory displays. The average rated value for each quality display 
establishes the subject’s visual and auditory quality rating for each Jow-, medium-, and 
high-quality display when having to rate both visual and auditory displays 
simultaneously. However, to conform with the next two experiments, only the first nine 
of the 18 combined auditory-visual displays are utilized during data analysis. 

The response time, the time to rate each display, was also collected. However, the 
‘subject was not aware of this fact. A conscious decision was made not to inform the 
subject, to avoid the possibility of the subject thinking that the faster the response, the 
better the score as in some kind of race. The idea 1s to keep the subject as relaxed as 
possible so that the subject’s decisions are based purely on perception, and not on time 


(speed) related factors. 


2. Data Analysis 
As in any experiment, proper/valid data analysis 1s critical. The first step towards 


a valid data analysis involves understanding and identifying the type of data collected 


such as nominal, ordinal, interval, and continuous. In this experiment, all the quality 
ratings collected are considered ordinal data. The reason for this is that the quality ratings 
are derived from rating scales which are used to rank the quality perception of the 
displays by giving a rating on a scale of | (lowest) to 7 (highest). To be contrasted with 
interval data, the difference in quality between the low and medium displays is not 
necessarily the same difference 1n quality between the medium- and high-quality 
displays. This is a very important point, which must be considered when selecting the 
proper data analysis method. 

The underlying distribution of the data is another very important factor in 
deciding how to analyze the data. Parametric data analysis can be used when assuming a 
certain underlying distribution of the data. Nonparametrics are used to test hypotheses 
about data from which the underlying distribution of data is not assumed. Thus, because 
this research does not assume a certain underlying distribution of the data, a 
nonparametric data analysis method 1s utilized. Specifically a one sample sign test used to 
compare the number of observations above and below a certain hypothesized value, 
which in this case is zero as described below. As such, to answer the questions outlined 
earlier supporting the goal of this experiment, the one sample sign test 1s used to 
investigate the following null hypotheses: 

1) The difference between a) the visual-only quality rating of a combined 
auditory-visual display, and b) the baseline rating for the visual-only quality display is 
ZETO: 

2) The difference between a) the auditory-only quality rating of a combined 
auditory-visual display, and b) the baseline rating for the auditory-only quality display 1s 
ZeEtO. 

3) The difference between a) the visual quality rating of a combined auditory- 
visual display when also rating the auditory display, and b) the baseline rating for the 


visual-only quality display 1s zero. 


ee) 


4) The difference between a) the auditory quality rating of a combined auditory- 
visual display when also rating the visual display, and b) the baseline rating for the 
auditory-only quality display 1s zero. 

Specifically, a one sample sign test 1s used to compare the number of observations 
above and below the difference in the baseline ratings for the auditory-only and visual- 
only quality displays and |) the visual-only quality rating of a combined auditory-visual 
display, 2) the auditory-only quality rating of a combined auditory-visual display. 3) the 
visual quality rating of a combined auditory-visual display when also rating the auditory 
display, and 4) the auditory quality rating of a combined auditory-visual display when 
also rating the visual display. The data analysis derived from the one sample sign test 
forms the foundation from which all major findings in this research effort are derived. All 
significant findings of this research effort are set at an alpha level of .05. In other words, 
the degree of confidence supporting all experimental findings 1s at the .05 level. As such, 
only P-values at the .O5 level will be reported as significant. This P-value 1s the 
probability of making a Type I Error. In other words, the P-value is the probability of 
rejecting the null hypothesis when in fact the null Arpoineec is true. As such, the smaller 
the P-value, the greater the confidence in rejecting the null hypothesis which in turn 
supports the alternative hypothesis (see [GOOD95] for more discussion on alpha lewell 


null hypothesis, alternative hypothesis, and Type I Error). 


H. RESULTS AND DISCUSSION 


The overall results of this experiment suggest significant auditory-visual cross- 
modal perception phenomena relevant to VE and multimedia developers. The major 


findings of this experiment are now discussed. 


1. Validity 
The first and most important consideration is whether the quality of the visual and 


auditory displays developed for this experiment are rank ordered by the subjects 
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Figure 62. Experiment 1: Visual-Only Quality Percept Ratings. 


according to their intended rankings. If this were not the case, the validity of the 
experiment would be jeopardized. However, in looking at Figure 62, one can see that the 
overall quality ratings of the visual displays are properly rank ordered by the subjects 
according to this experiment’s intended low-, medium-, and high-quality rankings. 
Likewise, in looking at Figure 63, one can see that the overall quality ratings of the 
auditory displays are properly rank ordered by the subjects according to this experiment’s 
intended low-, medium-, and high-quality rankings. Given that the data regarding quality 
of all displays are properly rank ordered, data analysis with respect to the hypotheses can 
continue. 

2. Findings 

Figure 64 represents the results of all one sample sign tests based on the first null 


hypothesis which states: the difference between a) the visual-only quality rating of a 


combined auditory-visual display, and b) the baseline rating for the visual-only quality 
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A2 = Low-Quality Auditory-Only Percept 
A4 = Med-Quality Auditory-Only Percept 
A6 = High-Quality Auditory-Only Percept 





Figure 63. Experiment 1: Auditory-Only Quality Percept Ratings. 


display is zero. As one can see from the results, when presented a combined high-quality 
visual and high-quality auditory display, when only asked to rate the quality of the visual 
display, a statistically significant finding at the .0161 level (a P-value of .0161) suggests 
that the quality perception of a high-quality visual display is increased when coupled with 
a high-quality auditory display. 

Figure 65 represents the results of all one sample sign tests based on the second 
null hypothesis which states: the difference between a) the auditory-only quality rating of 
a combined auditory-visual display, and b) the baseline rating for the auditory-only 
quality display is zero. As one can see from the results, when presented a combined low- 
quality auditory and high-quality visual display, when only asked to rate the quality of 
the auditory display, a statistically significant finding at the .0002 level strongly suggests 
that the quality perception of a low-quality auditory display is decreased when coupled 


with a high-quality visual display. 
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Figure 64. Experiment 1: One Sample Sign Tests for Visual-Only Quality Percept 
of Combined Auditory- Visual Displays. 


Figure 66 represents the results of all one sample sign tests based on the third null 
hypothesis which states: the difference between a) the visual quality rating of a combined 
auditory-visual display when also rating the auditory display, and b) the baseline rating 
for the visual-only quality display is zero. As one can see from the results, there are no 
significant findings at the .05 level. However it 1s Worth mentioning that when presented 
a combined high-quality visual display coupled with either a medium- or high-quality 
auditory display, when asked to rate both auditory and visual displays, the results at the 
.10 level suggest that the quality perception of the high-quality visual display 1s 


increased. 
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Figure 65. Experiment 1: One Sample Sign Tests for Auditory-Only Quality 
Percept of Combined Auditory-Visual Displays. 


Figure 67 represents the results of all one sample sign tests based on the fourth 


null hypothesis which states: the difference between a) the auditory quality rating of a 


combined auditory-visual display when also rating the ‘visual display, and b) the baseline 


rating for the auditory-only quality display is zero. The results suggest that: 1) when 


presented a combined low-quality auditory and high-quality visual display, when asked to 


rate both auditory and visual displays, a statistically significant finding at the .0107 level 


suggests that the quality perception of a low-quality auditory display 1s decreased when 


coupled with a high-quality visual display, and 2) when presented a combined high- 


quality auditory and low-quality visual display, when asked to rate both auditory and 


visual displays, a statistically significant finding at the .0241 level suggests that the 
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Figure 66. Experiment 1: One Sample Sign Tests for Visual Quality Percept When 
Also Rating the Auditory Display of Combined Auditory-Visual Displays. 


quality perception of a high-quality auditory display is increased when coupled with a 
low-quality visual display. 

In terms of response times, Figure 68 represents the average visual quality rating 
response times of a combined auditory-visual display, when only asked to rate the quality 
of the visual display. Figure 69 represents the average auditory quality rating response 
times of a combined auditory-visual display, when only asked to rate the quality of the 


auditory display. Figure 70 represents the average combined auditory and visual quality 
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Figure 67. Experiment 1: One Sample Sign Tests for Auditory Quality Percept 
When Also Rating the Visual Display of Combined Auditory-Visual Displays. 
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Figure 68. Experiment 1: Visual-Only Quality Rating Response Times of a 
Combined Auditory-Visual Display. 


‘rating response times of a combined auditory-visual display, when asked to rate both the 
auditory and visual displays. 

In looking at the results of the response times, one can see various trends based on 
a particular auditory-visual quality combination. However, several factors limit the ability 
to correctly analyze these temporal results in any statistically valid manner. These factors 
are discussed in the last chapter. Nevertheless, one key observation 1s worth mentioning. 
Nevertheless, the response time to rate the visual-only display of a combined auditory- 
visual display exhibited the only occasion in the entire experiment where gender seems to 


be a factor. In looking at Figure 71, it is apparent in every condition. that females need 
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Figure 69. Experiment 1: Auditory-Only Quality Rating Response Times of a 
Combined Auditory- Visual Display. 


more time than males to rate the visual displays. The reason for this is not known, but 
does suggest that it might be harder for females to filter out the auditory information 
while trying to attend only to the visual display. Another reason might be a result of the 
competitive nature of males. Specifically, males might have been more prone to answer 
as quickly as possible; whereas, females simply took as much time as they felt they 
needed. 

In terms of the post-experiment questions, Figure 72 represents the subject’s 
opinion on 1) how easy or difficult it was to determine the quality of the various displays, 


and 2) if less or more time was needed to adequately rate the various displays. Keeping in 
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Figure 70. Experiment 1: Response Times of Both Auditory and Visual 
Displays of a Combined Auditory- Visual Display. 


mind that subjects used a Likert rating scale ranging from | to 7 (4 being neutral) to rate 
their opinions, the results indicate that determining the quality of both auditory and visual 
displays of a combined auditory-visual display proved to be more difficult than 
determining the quality of either auditory or visual display presented either alone or in 
combination. Furthermore, the results indicate that eight seconds was an adequate amount 
of time to rate the visual-only and auditory displays, but that slightly more than eight 


seconds was desired when rating the combined auditory-visual displays. 


133 


Cell Line Chart 


Split By: Gender 
3.6 


3.4 


Cell Mean 


WY 
7 
° 
Ye 
Y 
VY 
S 
w 
= 
= 
w 
4 
S 
° 
& 
N 
oO 
~ 


V2A2 AV RT 
V2A4 AV RT 
V2A6AV RT 
V4 A2 AV RT 
V4 A4 AV RT 
V4 A6 AV RT 
V6 A2AV RT 
V6A4 AV RT 
V6 A6 AV RT 


V2A2 AV RT = Time to Rate Low-Quality Visual-Only Percept of Combined Low- Visual and Low-Auditory Quality Display 
V2A4 AV RT = Time to Rate Low-Quality Visual-Only Percept of Combined Low- Visual and Med-Auditory Quality Display 
V2A6 AV RT = Time to Rate Low-Quality Visual-Only Percept of Combined Low- Visual and High-Auditory Quality Display 
V4A2 AV.RT = Time to Rate Med-Quality Visual-Only Percept of Combined Med- Visual and Low-Auditory Quality Display 
V4A4 AV RT = Time to Rate Med-Quality Visual-Only Percept of Combined Med- Visual and Med-Auditory Quality Display 
V4A6 AV RT = Time to Rate Med-Quality Visual-Only Percept of Combined Med-Visual and High-Auditory Quality Display 
V6A2 AV RT = Time to Rate High-Quality Visual-Only Percept of Combined High-Visual and Low-Auditory Quality Display 
V6A4 AV RT = Time to Rate High-Quality Visual-Only Percept of Combined High-Visual and Med-Auditory Quality Display 
V6A6 AV RT = Time to Rate High-Quality Visual-Only Percept of Combined High- Visual and High-Auditory Quality Display 





Figure 71. Experiment 1: Comparison of Male and Female Response Times When 
Rating a Visual-Only Display of a Combined Auditory- Visual Display. 


Finally, the remaining questions of the post-experiment survey reveal that 31 of 
the 36 subjects (86.1%) focused on alphanumerics to determine the quality of the visual 
displays, and that 20 of the 36 subjects (55.5%) felt that they were mentally overloaded 
when having to rate both auditory and visual displays simultaneously. Some very 
Interesting observations were also observed concerning the descriptions subjects used to 
determine the quality of the various displays. These observations are outlined in the final 


chapter. 
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QI = How easy or difficult was is to determine the quality of the visual-only displays? 

Q2? = How easy or difficult was is to determine the quality of the auditory-only displays? 

Q3 = How easy or difficult was is to determine the visual quality of the auditory-visual displays? 

Q4 = How easy or difficult was 1s to determine the auditory quality of the auditory-visual displays? 

Q5 = How easy or difficult was to determine both the auditory and visual qualities of the auditory-visual displays? 
Q6 = Would you have liked less or more time to view the visual-only displays? 

Q7 = Would you have liked less or more time to hear the auditory-only displays? 

Q8 = Would you have liked less or more time to hear-view the combined auditory-visual displays? 





Figure 72. Experiment 1: Post-Experiment Questions 1 - 8. 


I. SUMMARY AND CONCLUSIONS 


Overall the findings suggest that whether asked to specifically attend to both 
auditory and visual modalities, or asked to attend to only one modality, both similar and 
dissimilar cross-modal auditory-visual perception phenomena exist. These findings 
suggest that when manipulating visual display pixel resolution and auditory display 
sampling frequency: 


1) When attending only to the visual modality or attending to both auditory and 
visual modalities, a high-quality visual display coupled with a high-quality auditory 
display causes an increase in the perception of visual display quality relative to 
established baseline conditions derived from visual-only quality perception evaluations. 


2) When attending only to the auditory modality or attending to both auditory and 
visual modalities, a low-quality auditory display coupled with a high-quality visual] 


display causes a decrease 1n the perception of auditory display quality relative to 
established baseline conditions derived from auditory-only quality perception 
evaluations. 


3) When attending to both auditory and visual modalities, a high-quality auditory 
display coupled with a low-quality visual display causes an increase in the perception of 
auditory display quality relative to established baseline conditions derived from auditory- 
only quality perception evaluations. 


However. would the same findings hold true when manipulating other quality 
parameters? As such, the next chapter investigates whether manipulating visual display 
Gaussian white noise level and auditory display Gaussian white noise level produce the 


Same results. 
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VII. EXPERIMENT 2: STATIC NOISE 


A. INTRODUCTION 


Experiment 2: Static Noise investigates the perceptual effects from manipulating 
visual display Gaussian noise level and auditory display Gaussian noise level. The visual 
display consists of a static image of a radio depicted in Chapter IV, Figure 32, and the 
auditory display is a selection of music. As in the previous experiment, the goal of this 
experiment is to answer the following questions: 


1) Does a high-quality auditory display coupled with a low-quality visual display 
Cause a decrease/increase in the perception of audio quality and/or an increase/decrease in 
the perception of visual quality relative to established baseline conditions derived from 
auditory-only and visual-only quality perception evaluations? 


2) Does a low-quality auditory display coupled with a high-quality visual display 
Cause an increase/decrease in the perception of audio quality and/or a decrease/increase in 
the perception of visual quality relative to established baseline conditions derived from 
auditory-only and visual-only quality perception evaluations? 


3) Does a low-quality auditory display coupled with a low-quality visual display 
cause a decrease/increase in the perception of audio quality and/or a decrease/increase in 
the perception of visual quality relative to established baseline conditions derived from 
auditory-only and visual-only quality perception evaluations? 


4) Does a high-quality auditory display coupled with a high-quality visual display 
cause an increase/decrease in the perception of audio quality and/or an increase/decrease 
in the perception of visual quality relative to established baseline conditions derived from 
auditory-only and visual-only quality perception evaluations? 


B. LOCATION 


Because the building containing the room of the first experiment was undergoing 
electrical rewiring resulting in many power outages, the location of this experiment was 
moved to a different building. Nevertheless, all testing sessions of Experiment 2: Static 
Noise were conducted in a similar isolated room under the same ambient conditions. The 


dimensions of the room were slightly smaller than that of the first experiment at 


Loy 


approximately 10 feet x 10 feet. Before cach session, |) all nonessential electronic 
equipment was turned off, 2) telephones were unplugged, 3) windows were closed and 
covered with blackout cloth, 4) the main overhead lights were turned off, 5) a 60 watt 
incandescent desk Jamp was turned on behind the computer monitor to eliminate any 
glare, 6) the door to the room was closed, 7) a Do Not Disturb Sign was placed on the 
outside of the door, and 8) the subject was asked to turn off any audible pagers, mobile 


phones, and/or watches. 


C. PARTICIPANTS 


A total of 36 volunteer participants (27 Male, 9 Female) comprised from the 
students, faculty, staff, and guests of NPS served as subjects. Based on the limited gender 
findings of the first experiment (Experiment 1: Static Resolution), the number of male 
and female subjects in this experiment is not balanced. The average age of the subjects 1s 
36.1 years ranging in age from 19 to 54. As with the previous experiment, all subjects 
were required to have 20/20 or corrected 20/20 vision and normal hearing. Because the 
experiment did not involve precise measurements of Gaussian noise levels, a vision and 
hearing test were not needed. Before conducting the experiment, each subject was asked, 


as part of a voluntary consent form, if he or she met the vision and hearing requirements. 


-D. APPARATUS 


The apparatus used in this experiment ts identical to that of Experiment 1: Static 


Resolution. See Chapter VII, Section D. 


E. PROCEDURE 


Except for a few changes which will be discussed, the procedure of this 
experiment is identical to that of the first experiment, Experiment 1: Static Resolution. 
The experiment involved a 3x3 factorial within subjects design. The two independent 


variables are visual and audio display quality. The two dependent variables are the 


es 


corresponding quality perception of the auditory and visual displays. The development 
process of the visual displays was identical to that of the first experiment, except that 
Gaussian white noise levels were manipulated with Adobe Photoshop |ADOB98}] as 
opposed to pixel resolution. The three levels of the visual quality independent variable 
consist of low-, medium-, and high-quality visual displays of the radio image depicted in 
Chapter IV, Figure 32, having added Gaussian noise level amounts of 24, 18, and 12, 
respectively. The number corresponding to the amount of Gaussian noise is a relative 
number based on a scale of | to 999 that is used in Adobe Photoshop. Likewise, the 
development process of the auditory displays was identical to that of the first experiment, 
except that Gaussian noise levels of the original music selection at 44.1 kHz, were 
manipulated with Sonic Foundary’s SoundForge [SONI98] as opposed to sampling 
frequency. The resulting three levels of the auditory quality independent variable consist 
of low-, medium-. and high-quality auditory displays of the same music selection 
presented monophonically at 44.1 kHz having mixed in Gaussian noise level amounts of 
31 percent, 23 percent, and 15 percent, respectively. As such, both the visual and Recon 
display parameters manipulated are Gaussian noise level. During the experiment, which 
lasts approximately 30 minutes, each subject wears headphones and sits in front of a 20- 
inch computer display monitor. The task of the subject 1s to rate the perceived quality of 
audio only, visual-only, and audio-visual displays via Likert rating scales ranging from | 
(low) to 7 (high). 

The lowest- and highest-quality auditory displays in which the subjects were 
supposed to memorize during the self-calibration phase corresponded to the music 
selection at 44.1 kHz, having mixed in Gaussian noise level amounts of 45 percent and 
10 percent, respectively. The lowest- and highest-quality visual displays in which the 
subjects were supposed to memorize during the self-calibration phase are depicted in 
Figure 73 and Figure 74, respectively. The low-quality visual display has an added 
Gaussian noise level amount of 45; whereas the high-quality visual display has an added 


Gaussian noise level amount of 10. Again, it 1s important to remember that the original 
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Figure 74. Experiment 2: High-Quality Visual Display Familiarization. 
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You will now be presented two Visual Displays. 

One display 1s of ‘Low Quality’ and the other ts of 'High Quality’ 

To see the ‘Low Quality’ display, click on the 'LOW QUALITY' link 

To see the ‘High Quality’ display, click on the 'HIGH QUALITY’ link 

You Can view either display as lang as you like 

You can go back and fortn between the displays as many times as you Itke 

Later in this experiment, you will be tested on your ability to correctly 
tdentify various quality levels of visual displays. Therefore, at this time 


you should try your best to memorize what Is considered to be a 'Low Quality’ 
display, and what is considered to be a ‘High Quality’ display. 


When you are ready to rate the quality of visual displays, click on the ’FINISHED' link. 


Press to Continue | 


Figure 75. Experiment 2: Visual Display Instructions. 





displays were depicted in color, and that the actual Gaussian noise level experienced by 
the subject can only be viewed on the actual 20-inch computer monitor. However, the 
low- and high-quality displays depicted in Figure 73 and Figure 74 are fairly good 
representations of the quality difference between the actual displays used in the 
experiment. Besides the different auditory and visual stimuli utilized, the procedure 
continues exactly as in the previous experiment except for 1) minor changes in the 
readability of instructions, 2) an increase in the number of visual-only and auditory-only 
quality ratings, and 3) a decrease from 18 to nine combined auditory-visual ratings during 
the final portion of the experiment. These changes are now discussed. 

Based on the subjects’ comments on the previous experiment, the readability of 
the instructions was enhanced by adding more white space. An example of this is 
comparing the instructions from the previous experiment as depicted in Chapter VU, 
Figure 52 with the revised instructions as depicted in Figure 75. Note that the content of 
the instructions was not changed only the readability was enhanced through increased use 


of white space. 


14] 


In order to establish a stronger confidence in the baseline ratings for the visual- 
only and auditory-only displays, the number of quality ratings made during the visual- 
only and auditory-only portions was increased from 9 to 12. However, to conform with 
the data analysis of the previous experiment, the first three ratings, consisting of one low- 
. medium-, and high-quality were disregarded. The idea was to allow the subject, 
unknowingly, to see/hear the three quality levels one time before having to make a rating. 
The baseline ratings were still based on an average of three quality ratings to conform 
with the data analysis of the previous, and the only result 1s an increase in the confidence 
of the baseline ratings and not an increase of the number of stimuli used to average the | 
baseline ratings. 

The final portion of the experiment was also changed based on subjects’ 
comments from the previous experiment. Subjects felt that rating 18 combined auditory- 
visual displays was somewhat long and tiresome. As a result, the number of combined 
auditory-visual display ratings during the final portion of the experiment was decreased 
from 18 to 9 in an effort to maiftain a higher level of subject interest. 

Again, other than the above mentioned changes, the procedure of this experiment 
is identical to that of the previous experiment. As a result, the same data collection 


factors and data analysis are used to examine the results. 


F. RESULTS AND DISCUSSION 


As with the previous experiment, the overall results of this experiment suggest 
significant auditory-visual cross-modal perception phenomena relevant to VE and 


multimedia developers. The major findings of this experiment are now discussed. 


1. Validity 
The first and most important consideration 1s whether the quality of the visual and 
auditory displays developed for this experiment are rank ordered by the subjects 


according to their intended rankings. If this were not the case, the validity of the 
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V2 Only Percept V4Only Percept V6 Only Percept 


V2 = Low-Quality Visual-Only Percept 
V4 = Med-Quality Visual-Only Percept 
V6 = High-Quality Visual-Only Percept 


Figure 76. Experiment 2: Visual-Only Quality Percept Ratings. 


experiment would be jeopardized. However, in looking at Figure 76, one can see that the 
overall quality ratings of the visual displays are properly rank ordered by the subjects 
according to this experiment’s intended low-, medium-, and high-quality rankings. 
Likewise, in looking at Figure 77, one can see that the overall quality ratings of the 
auditory displays are properly rank ordered by the subjects according to this experiment’s 
‘intended low-, medium-, and high-quality rankings. Given that the data regarding quality 
of all displays are properly rank ordered, data analysis with respect to the hypotheses can 


continue. 
2. Findings 
Figure 78 represents the results of all one sample sign tests based on the first null 


hypothesis which states: the difference between a) the visual-only quality rating of a 


combined auditory-visual display, and b) the baseline rating for the visual-only quality 
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Figure 77. Experiment 2: Auditory-Only Quality Percept Ratings. 


display is zero. AS one can see from the results, there are no statistically significant 
findings in any of the quality combinations. 

Figure 79 represents the results of all one sample sign tests based on the second 
null hypothesis which states: the difference between a) the auditory-only quality rating of 
a combined auditory-visual display, and b) the baseline rating for the auditory-only 
quality display is zero. As one can see from the results, 1) when presented a combined 
low-quality auditory and high-quality visual display, when only asked to rate the quality 
of the auditory display, a statistically significant finding at the .0290 level suggests that 
the quality perception of a low-quality auditory display,is decreased when coupled with a 
high-quality visual display, and 2) when presented a combined high-quality auditory and 
high-quality visual display, when only asked to rate the quality of the auditory display, a 


Statistically significant finding at the .0243 level suggests that the quality perception of a 
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Figure 78. Experiment 2: One Sample Sign Tests for Visual-Only Quality Percept 
of Combined Auditory- Visual Displays. 


high-quality auditory display is increased when coupled with a high-quality visual 


display. 


Figure 80 represents the results of all one sample sign tests based on the third null 


hypothesis which states: the difference between a) the visual quality rating of a combined 


auditory-visual display when also rating the auditory display, and b) the baseline rating 


for the visual-only quality display is zero. As one can see from the results, there are no 


significant findings at the .OS level. However it is worth mentioning that there are three 


findings at the .10 level which one can see from the figure. 


Figure 81 represents the results of all one sample sign tests based on the fourth 


null hypothesis which states: the difference between a) the auditory quality rating of a 
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Figure 79. Experiment 2: One Sample Sign Tests for Auditory-Only Quality 
Percept of Combined Auditory- Visual Displays. 


combined auditory-visual display when also rating the visual display, and b) the baseline 


rating for the auditory-only quality display is zero. The results suggest that: 1) when 


presented a combined medium-quality auditory and medium-quality visual display, when 


asked to rate both auditory and visual displays, a statistically significant finding at the 


0029 level suggests that the quality perception of a medium-quality auditory display 1s 


increased when coupled with a medium-quality visual display, and 2) when presented a 


combined high-quality auditory and high-quality visual display, when asked to rate both 


auditory and visual displays, a statistically significant finding-at the .0294 level suggests 


that the quality perception of a high-quality auditory display is increased when coupled 


with a high-quality visual display. 
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Figure 80. Experiment 2: One Sample Sign Tests for Visual Quality Percept When 
Also Rating the Auditory Display of Combined Auditory- Visual Displays. 


In terms of response times, Figure 82 represents the average visual quality rating 
response times of a combined auditory-visual display, when only asked to rate the quality 
of the visual display. Figure 83 represents the perce auditory quality rating response 
times of a combined auditory-visual display, when only asked to rate the quality of the 
auditory display. Figure 84 represents the average combined auditory and visual quality 


rating response times of a combined auditory-visual display, when asked to rate both the 
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Figure 81. Experiment 2: One Sample Sign Tests for Auditory Quality Percept 
When Also Rating the Visual Display of Combined Auditory- Visual Displays. 
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Figure 82. Experiment 2: Visual-Only Quality Rating Response Times of a 
Combined Auditory- Visual Display. 


‘auditory and visual displays. In looking at the results of the response times, one can see 
various trends based on a particular auditory-visual quality combination. However, 
several factors limit the ability to correctly analyze these temporal results in any 
statistically valid manner. These factors are discussed in the last chapter. 

In terms of the post-experiment questions, Figure 85 represents the subject’s 
opinion on 1) how easy or difficult it was to determine the quality of the various displays, 
and 2) if less or more time was needed to adequately rate the various displays. Keeping in 
mind that subjects used a Likert rating scale ranging from | to 7 (4 being neutral) to rate 


their opinions, the results indicate that determining the quality of both auditory and visual 
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Figure 83. Experiment 2: Auditory-Only Quality Rating Response Times of a 
Combined Auditory-Visual Display. 


displays of a combined auditory-visual display proved to be more difficult than 
determining the quality of either auditory or visual FeDl presented either alone or in 
combination. Furthermore, the results indicate that erght seconds was an adequate amount 
of time to rate the visual-only and auditory displays, but that slightly more than eight 
seconds was desired when rating the combined auditory-visual displays. 

Finally, the remaining questions of the post-experiment survey reveal that 29 of 
the 36 subjects (80.1%) focused on alphanumerics to determine the quality of the visual 
displays, and that only 7 of the 36 subjects (19.4%) felt that they were mentally 


overloaded when having to rate both auditory and visual displays simultaneously. As in 
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Figure 84. Experiment 2: Response Times of Both Auditory and Visual 
Displays of a Combined Auditory- Visual Display. 


the previous experiment, Some very interesting observations were also observed 
concerning the descriptions that the subjects used to determine the quality of the various 


displays. These observations are outlined in the final chapter. 


G. SUMMARY AND CONCLUSIONS 


Overall] the findings suggest that whether asked to specifically attend to both 


auditory and visual modalities, or asked to attend to only one modality, both similar and 
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QI = How easy or difficult was 1s to determine the quality of the visual-only displays? 
Q2 = How easy or difficult was 1s to determine the quality of the auditory-only displays? 
Q3 = How easy or difficult was 1s to determine the visual quality of the auditory-visual displays? 
Q4 = How easy or difficult was 1s to determine the auditory quality of the auditory-visual displays? 
Q5 = How easy or difficult was to determine both the auditory and visual qualities of the auditory-visual displays? 
Q6 = Would you have liked less or more time to view the visual-only displays? 
Q7 = Would you have liked less or more time to hear the auditory-only displays? 
~Q8 = Would you have liked less or more time to hear-view the combined auditory-visual displays? 





Figure 85. Experiment 2: Post-Experiment Questions I - 8. 


dissimilar cross-modal auditory-visual perception phenomena exist. These findings 
suggest that when manipulating both visual and auditory display Gaussian noise level: 


1) When attending only to the auditory modality, a low-quality auditory display 
coupled with a high-quality visual display causes a decrease in the perception of auditory 
quality relative to established baseline conditions derived from auditory-only quality 
perception evaluations. 


2) When attending only to the auditory modality, or attending to both auditory and 
visual modalities, a high-quality auditory display coupled with a high-quality visual 
display causes an increase in the perception of visual quality relative to established 
baseline conditions derived from visual-only quality perception evaluations. 


3) When attending to both auditory and visual modalities, a medium-quality. auditory 
display coupled with a medium-quality visual display causes an increase in the perception 
of auditory quality relative to established baseline conditions derived from auditory-only 
quality perception evaluations. 


i52 


Thus far, the first two experiments have used a perceptually tight coupling of 
radio and music to represent the visual and auditory displays. However, might the same 
findings hold true if the auditory and visual displays were not semantically associated 
with each other? The next chapter describes the final experiment of this research effort 


which investigates the answer to this question. 
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IX. EXPERIMENT 3: STATIC RESOLUTION 
NONALPHANUMERIC 


A. INTRODUCTION 


Experiment 3: Static Resolution NonAlphanumeric is designed to investigate the 
perceptual effects from manipulating visual display pixel resolution and auditory display 
sampling frequency. The visual display consists of the aforementioned fruit-flower scene 
depicted in Chapter [V, Figure 33 and the auditory display is a selection of music. As in 
the previous experiments, the goal of this experiment is to investigate the following 
questions: 


1) Does a high-quality auditory display coupled with a low-quality visual display 
cause a decrease/increase in the perception of audio quality and/or an increase/decrease in 
the perception of visual quality relative to established baseline conditions derived from 
auditory-only and visual-only quality perception evaluations? 


2) Does a low-quality auditory display coupled with a high-quality visual display 
cause an increase/decrease in the perception of audio quality and/or a decrease/increase in 
the perception of visual quality relative to established baseline conditions derived from 
auditory-only and visual-only quality perception evaluations? 


3) Does a low-quality auditory display coupled with a low-quality visual display 
cause a decrease/increase in the perception of audio quality and/or a decrease/increase in 
the perception of visual quality relative to established baseline conditions derived from 
auditory-only and visual-only quality perception evaluations? 


4) Does a high-quality auditory display coupled with a high-quality visual display 
cause an increase/decrease in the perception of audio quality and/or an increase/decrease 
in the perception of visual quality relative to established baseline conditions derived from 
auditory-only and visual-only quality perception evaluations? 


B. LOCATION 


The location and ambient conditions for this experiment were identical to that of 


the previous experiment, Experiment 2: Static Noise. See Chapter VIII, Section B. 


C. PARTICIPANTS 


A total of 36 volunteer participants (14 Male, 22 Female) comprised from the 
students, faculty, staff, and guests of NPS served as subjects. Again, based on the limited 
gender findings of the first two experiments, the number of male and female subjects in 
this experiment is not balanced. The average age of the subjects is 35.5 years ranging in 
age from 11 to 59 (two female subjects did not give their age). As with the previous 
experiment, all subjects were required to have 20/20 or corrected 20/20 vision and normal 
hearing. Because the experiment did not involve precise measurements of pixel resolution 
or sampling frequency, a vision and hearing test were not needed. Before conducting the 
experiment, each subject was asked, as part of a voluntary consent form, if he or she met 


the vision and hearing requirements. 


D. APPARATUS 


The apparatus used in this experiment 1s identical to that of the first two 
experiments: Experiment 1: Static Resolution and Experiment 2: Static Noise. See 


Chapter VII, Section D. 


E. PROCEDURE 


The procedure of this experiment is identical to that of the previous experiment, 
Experiment 2: Static Noise. The experiment involved a 3x3 factorial within subjects 
design. The two independent variables are visual and audio display quality. The two 
dependent variables are the corresponding quality perception of the auditory and visual 
displays. The three levels of the visual quality independent variable consist of low-, 
medium-, and high-quality visual displays of the fruit-flower scene depicted earlier in 
Chapter IV, Figure 33 having resolutions of 34 pixels/inch, 50 pixels/inch, and 66 pixels/ 
inch respectively. Another key aspect for using the fruit-flower scene is that it has no 


alphanumerics, hence the name of this experiment. In the previous two experiments, 60 
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out of 72 subjects (83.3%) focused on alphanumerics when determining the quality of the 
visual displays. As such, another goal of this experiment Is to investigate whether a lack 
of alphanumeric features has any affect on the overall ability of the subjects to determine 
the quality of the visual displays. The three levels of the auditory quality independent 
variable consist of low-, medium-, and high-quality auditory displays of the same music 
selection presented monophonically having sampling rates of 11 kHz, 19 kHz, and 35 
kHz respectively. As such, the visual display parameters manipulated are pixel resolution, 
and the auditory display parameters manipulated are sampling frequency. During the 
experiment which lasts approximately 30 minutes, each subject wears headphones and 
sits in front of a 20-inch computer display monitor. The task of the subject is to rate the 
perceived quality of auditory-only, visual-only, and auditory-visual displays via Likert 
rating scales ranging from | (low) to 7 (high). 

The lowest and highest quality auditory displays in which the subjects were 
supposed to memorize during the self-calibration phase corresponded to the music 
selection at 8 kHz and 44.] kHz respectively. The lowest and highest quality visual 
displays in which the subjects were supposed to enone during the self-calibration 
phase are depicted in Figure 86 and Figure 87 respectively. The low-quality visual 
display has a resolution of 28 pixels/inch; whereas the high-quality visual display has a 
resolution of 72 pixels/inch. Again, it is important to remember that the original displays 
were depicted in color, and that the actual pixel resolution experienced by the subject can 
only be viewed on the actual 20 inch computer monitor. However, the oie and high- 
quality displays depicted in Figure 86 and Figure 87 are fairly good representations of the 
quality difference between the actual displays used in the experiment. Besides the 
different auditory and visual stimuli utilized. the procedure continues exactly as in the 
previous experiment. As a result, the same data collection factors and data analysis are 


used to examine the results. 
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Figure 86. Experiment 3: Low-Quality Visual Display Familiarization. 
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Figure 87. Experiment 3: High-Quality Visual Display Familiarization. 
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Figure 88. Experiment 3: Visual-Only Quality Percept Ratings. 


F. RESULTS AND DISCUSSION 


As with the previous experiment, the overall results of this experiment suggest 
significant auditory-visual cross-modal perception phenomena relevant to VE and 


multimedia developers. The major findings of this experiment are now discussed. 


1. Validity 

As with the previous experiments, the first and most important consideration is 
whether the quality of the visual and auditory displays developed for this experiment are 
rank ordered by the subjects according to their intended rankings. If this were not the 
case, the validity of the experiment would be jeopardized. However, in looking at Figure 
88, one can see that the overall quality ratings of the visual displays are properly rank 
ordered by the subjects according to this experiment’s intended low-, medium- and high- 


quality rankings. As such, a lack of alphanumeric features has no affect on the overall 
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Figure 89. Experiment 3: Auditory-Only Quality Percept Ratings. 


ability of the subjects to determine the quality of the visual displays. Likewise, in looking 
at Figure 89, one can see that the overall quality ratings of the auditory displays are 
properly rank ordered by the subjects according to this experiment’s intended low-, 
medium-, and high-quality rankings. Given that the data regarding quality of all displays 


are properly rank ordered, data analysis with respect to the hypotheses can continue. 


2. Findings 

Figure 90 represents the results of all one sample sign tests based on the first null 
hypothesis which states: the difference between a) the visual-only quality rating of a 
combined auditory-visual display, and b) the baseline rating for the visual-only quality 
display is zero. As one can see from the results, 1) when presented a combined high- 
quality visual and medium-quality auditory display, when only asked to rate the quality 
of the visual display, a statistically significant finding at the .0201 level suggests that the 


quality perception of a high-quality visual display is increased when coupled with a 
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Figure 90. Experiment 3: One Sample Sign Tests for Visual-Only Quality Percept 
of Combined Auditory- Visual Displays. 


medium-quality auditory display, and 2) when presented a combined high-quality visual 


and high-quality auditory display, when only asked to rate the quality of the visual 


display, a statistically significant finding at the .0161 level suggests that the quality 


perception of a high-quality visual display is increased when coupled with a high-quality 


auditory display. 


Figure 91 represents the results of all one sample sign tests based on the second 


null hypothesis which states: the difference between a) the auditory-only quality rating of 


a combined auditory-visual display, and b) the baseline rating for the auditory-only 


quality display is zero. As one can see from the results, there are no statistically 


significant findings in any of the quality combinations. 
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Figure 91. Experiment 3: One Sample Sign Tests for Auditory-Only Quality 


Percept of Combined Auditory- Visual Displays. 


Figure 92 represents the results of all one sample sign tests based on the third null 


hypothesis which states: the difference between a) the visual quality rating of a combined 


auditory-visual display when also rating the auditory display, and b) the baseline rating 


for the visual-only quality display 1s zero. As one can see from the results, when 


presented a combined high-quality visual and high-quality auditory display, when asked 


to rate both auditory and visual displays, a statistically significant finding at the .0125 


level suggests that the quality perception of a high-quality visual display 1s increased 


when coupled with a high-quality auditory display. 


Figure 93 represents the results of all one sample sign tests based on the fourth 


null hypothesis which states: the difference between a) the auditory quality rating of a 
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Figure 92. Experiment 3: One Sample Sign Tests for Visual Quality Percept When 
Also Rating the Auditory Display of Combined Auditory-Visual Displays. 


combined auditory-visual display when also rating the visual display, and b) the baseline 


rating for the auditory-only quality display is zero. The results suggest that when 


presented a combined medium-quality auditory and low-quality visual display, when 


asked to rate both auditory and visual displays, a statistically significant finding at the 


.0351 level suggests that the quality perception of a medium-quality auditory display is 


decreased when coupled with a low-quality visual display. 


In terms of response times, Figure 94 represents the average visual quality rating 


response times of a combined auditory-visual display, when only asked to rate the quality 


of the visual display. Figure 95 represents the average auditory quality rating response 
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Figure 93. Experiment 3: One Sample Sign Tests for Auditory Quality Percept 
When Also Rating the Visual Display of Combined Auditory- Visual Displays. 


times of a combined auditory-visual display, when only asked to rate the quality of the 
auditory display. Figure 96 represents the average combined auditory and visual quality 


rating response times of a combined auditory-visual display, when asked to rate both the 
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Figure 94. Experiment 3: Visual-Only Quality Rating Response Times of a 
Combined Auditory- Visual Display. 


auditory and visual displays. In looking at the results of the response times, one can see 
various trends based on a particular auditory-visual quality combination. However, 
several factors limit the ability to correctly analyze these temporal results in any 
Statistically valid manner. These factors are discussed in the last chapter. 

In terms of the post-experiment questions, Figure 97 represents the subject’s 
opinion on |) how easy or difficult it was to determine the Aeality of the various displays, 
and 2) if less or more time was needed to adequately rate the various displays. Keeping in 
mind that subjects used a Likert rating scale ranging from | to 7 (4 being neutral) to rate 


their opinions, the results indicate that determining the quality of both auditory and visual 
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Figure 95. Experiment 3: Auditory-Only Quality Rating Response Times of a 
Combined Auditory-Visual Display. 


displays of a combined auditory-visual display proved to be more difficult than 
determining the quality of either auditory or visual display presented either alone or in 
combination. Furthermore, the results indicate that eight seconds was an adequate amount 
of time to rate the visual-only and auditory displays, but that slightly more than eight 
seconds was desired when rating the combined auditory-visual displays. 

Finally, the remaining questions of the post-experiment survey reveal that only 9 
of the 36 subjects (25.0%) felt that they were mentally overloaded when having to rate 
both auditory and visual displays simultaneously. As in the previous experiment, some 


very interesting observations were also observed concerning the descriptions that the 
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Figure 96. Experiment 3: Response Times of Both Auditory and Visual 
Displays of a Combined Auditory-Visual Display. 


subjects used to determine the quality of the various displays. These observations are 


outlined in the final chapter. 


G. SUMMARY AND CONCLUSIONS — 


Overall the findings suggest that whether asked to specifically attend to both 
auditory and visual modalities, or asked to attend to only one modality. both similar and 


dissimilar cross-modal auditory-visual perception phenomena exist. These findings 
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Figure 97. Experiment 3: Post-Experiment Questions 1 - 8. 


suggest that when manipulating visual display pixel resolution and auditory display 
sampling frequency: 


|) When attending only to the visual modality, a high-quality visual display coupled 
with a medium-quality auditory display causes an increase in the perception of visual 
quality relative to established baseline conditions derived from visual-only quality 
perception evaluations. . 


2) When attending only to the visual modality, or attending to both auditory and 
visual modalities, a high-quality visual display coupled with a high-quality auditory 
display causes an increase in the perception of visual quality relative to established 
baseline conditions derived from visual-only quality perception evaluations. 


3) When attending to both auditory and visual modalities, a medium-quality auditory 
display coupled with a low-quality visual display causes a decrease in the perception of 
auditory quality relative to established baseline conditions derived from auditory-only 
quality perception evaluations. 
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Therefore, even though the auditory and visual displays were not perceptually 
tightly coupled auditory-visual displays as in the first two experiment, the results indicate 
that the effects of auditory-visual cross-modal perception phenomena persist. The next 


chapter presents an overview of the combined results from all three experiments. 
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X. SUMMARY AND CONCLUSIONS 


A. INTRODUCTION 


This chapter represents the culmination of two and a half years of research and 
development in support of evidence concerning auditory-visual cross-modal perception 
phenomena. The overall results. conclusions, impact, observations, recommendations, 


future work, and final thoughts are presented. 


B. OVERALL RESULTS 


Because all collected data were derived from identical experimental conditions 
based on the same low-, medium-, and high-quality ordering of the auditory and visual 
stimuli, combining datasets from all three experiments is justified in order to consider 
overall results. As such, the following are the overall results from combining the datasets 


from all three experiments. 


1. Participants 

Overall a total of 108 volunteer participants (59 Male, 49 Female) comprised 
from the students, faculty, staff, and guests of NPS served as subjects. The peal 
average age of the subjects is 36.1 years ranging in age from |1 to 63 (four female 
subjects did not give their age). All subjects were required to have 20/20 or corrected 20/ 
20 vision and normal hearing. As such, before conducting the experiment, each subject 
was asked, as part of a voluntary consent form, if he or she met the vision and hearing 


requirements. 


2. Validity 
Again, the first and most important consideration is whether the overall quality of 
the visual and auditory displays are rank ordered by the subjects according to their 


intended rankings. In looking at Figure 98, one can see that the overall quality ratings of 
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Figure 98. Combined Data: Visual-Only Quality Percept Ratings. 


the visual displays are properly rank ordered by the subjects. Likewise, in looking at 
Figure 99, one can see that the overall quality ratings of the auditory displays are properly 
rank ordered by the subjects. Given that the data regarding quality of all displays are 


properly rank ordered, data analysis with respect to the hypotheses can continue. 


3. Overall Findings 


Figure 100 represents the results of all one sample sign tests based on the first null 
hypothesis which states: the difference between a) the visual-only quality rating of a 
combined auditory-visual display, and b) the baseline rating for the visual-only quality 
display is zero. As one can see from the results, 1) when presented a combined high- 
quality visual and medium-quality auditory display, when only asked to rate the quality 
of the visual display, a statistically significant finding at the 0124 level suggests that the 
quality perception of a high-quality visual display is increased when coupled with a 


medium-quality auditory display, and 2) when presented a combined high-quality visual 
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Figure 99, Combined Data: Auditory-Only Quality Percept Ratings. 


and high-quality auditory display, when only asked to rate the quality of the visual 
display, a statistically significant finding at the .0002 level strongly suggests that the 
quality perception of a high-quality visual display 1s increased when coupled with a high- 
quality auditory display. 

Figure 101 represents the results of all one sample sign tests based on the second 
null hypothesis which states: the difference between a) the auditory-only quality rating of 
a combined auditory-visual display, and b) the baseline rating for the auditory-only 
quality display is zero. As one can see from the results, 1) when presented a combined 
low-quality auditory and medium-quality visual display, when only asked to rate the 
quality of the auditory display, a statistically significant finding at the .0375 level 
suggests that the quality perception of a low-quality auditory display is decreased when 
coupled with a medium-quality visual display, and 2) when presented a combined low- 


quality auditory and high-quality visual display, when only asked to rate the quality of 
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Figure 100. Combined Data: One Sample Sign Tests for Visual-Only Quality 
Percept of Combined Auditory- Visual Displays. 


the auditory display, a statistically significant finding at the .0002 level strongly suggests 


that the quality perception of a low-quality auditory display is decreased when coupled 


with a high-quality visual display. 


Figure 102 represents the results of all one sample sign tests based on the third 


null hypothesis which states: the difference between a) the visual quality rating of a 


combined auditory-visual display when also rating the auditory display, and b) the 


baseline rating for the visual-only quality display is zero. As one can see from the results, 


|) when presented a combined high-quality visual and low-quality auditory display, when 


asked to rate both auditory and visual displays, a statistically significant finding at the 


0172 level suggests that the quality perception of a high-quality visual display is 
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Figure 101. Combined Data: One Sample Sign Tests for Auditory-Only Quality 
Percept of Combined Auditory- Visual Displays. 


increased when coupled with a low-quality auditory display, and 2) when presented a 


combined high-quality visual and medium-quality auditory display, when asked to rate. 


both auditory and visual displays, a statistically significant finding at the .0042 level 


strongly suggests that the quality perception of a high-quality visual display is increased 


when coupled with a medium-quality auditory display, and 3) when presented a 


combined high-quality visual and high-quality auditory display, when asked to rate both 


auditory and visual displays. a statistically significant finding at the .0034 level strongly 


suggests that the quality perception of a high-quality visual display is increased when 


coupled with a high-quality auditory display. 
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Figure 102. Combined Data: One Sample Sign Tests for Visual Quality Percept 
When Also Rating the Auditory Display of Combined Auditory-Visual Displays. 


Figure 103 represents the results of all one sample sign tests based on the fourth 


null hypothesis which states: the difference between a) the auditory quality rating of a 
combined auditory-visual display when also rating the visual display, and b) the baseline 
rating for the auditory-only quality display is zero. The results suggest that there are no 
Statistically significant findings in any of the quality combinations. However, it is worth 
mentioning that when presented a combined low-quality auditory and high-quality visual 
display, when asked to rate both auditory and visual displays, the results at the .0586 
level suggests that the quality perception of a low-quality auditory display 1s decreased 


when coupled with a high-quality visual display. 
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Figure 103. Combined Data: One Sample Sign Tests for Auditory Quality Percept 
When Also Rating the Visual Display of Combined Auditory- Visual Displays. 


In terms of response times, Figure 104 represents the overall average visual 


quality rating response times of a combined auditory-visual display, when only asked to 


rate the quality of the visual display. Figure 105 represents the overall average auditory 


quality rating response times of a combined auditory-visual display, when only asked to 


rate the quality of the auditory display. Figure 106 represents the overall average 


combined auditory and visual quality rating response times of a combined auditory-visual 
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Figure 104. Combined Data: Visual-Only Quality Rating Response Times of a 
Combined Auditory- Visual Display 


display, when asked to rate both the auditory and visual displays. Again, in looking at 
the overall results of the response times, one can see various trends, however, several 
factors limit the ability to correctly analyze these temporal results in any statistically 
valid manner. These factors are discussed in the OBSERVATIONS section below. 

In terms of the post-experiment questions, Figure !07 represents the overall 
subject’s opinion on 1) how easy or difficult 1t was to determine the quality of the various 
displays, and 2) if less or more time was needed to adequately rate the various displays. 
Keeping in mind that subjects used a Likert rating scale ranging from I to 7 (4 being 


neutral) to rate their opinions, the overall! results indicate that determining the quality of 
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A4V4 AV RT = Time to Rate Med-Quality Auditory-Only Percept of Combined Med-Auditory and Med-Visual Quality Display 
A4V6 AV RT = Time to Rate Med-Quality Auditory-Only Percept of Combined Med-Auditory and High- Visual Quality Display 
A6V2 AV RT = Time to Rate High-Quality Auditory-Only Percept of Combined High-Auditory and Low- Visual Quality Display 
A6V4 AV RT = Time to Rate High-Quality Auditory-Only Percept of Combined High-Auditory and Med-Visual Quality Display 
A6V6 AV RT = Time to Rate High-Quality Auditory-Only Percept of Combined High-Auditory and High-Visual Quality Display 





Figure 105. Combined Data: Auditory-Only Quality Rating Response Times of a 
Combined Auditory- Visual Display. 


both auditory and visual displays of a combined auditory-visual display proved to be 
more difficult than determining the quality of either auditory or visual display presented 
either alone or in combination. Furthermore, the results indicate that eight seconds was an 
adequate amount of time overall to rate the visual-only and auditory displays, but that 
slightly more than eight seconds was desired when rating the combined auditory-visual 
displays. 

Finally, the remaining questions of the post-experiment survey reveal that 60 out 
of 72 subjects (83.3%), focused on alphanumerics when determining the quality of the 


visual displays (only applicable in the first two experiments) and that 36 of the 108 
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A4V2 CAV RT = Time to Rate Both Med-Auditory and Low-Visual Quality Displays of a Combined Display 
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Figure 106. Combined Data: Response Times of Both Auditory and Visual 
Displays of a Combined Auditory- Visual Display. 


subjects (33.3%) felt that they were mentally overloaded when having to rate both 


auditory and visual displays simultaneously. 


C. OVERALL CONCLUSIONS 


The goal of this research has been achieved. By varying the quality (fidelity) of 
both auditory and visual displays, it has been possible to measure auditory-visual cross- 
modal perception phenomena. The overall conclusions suggest that 1) whether asked to 


specifically attend to both auditory and visual modalities or asked to attend to only one 
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Q1 = How easy or difficult was is to determine the quality of the visual-only displays? 

Q2 = How easy or difficult was is to determine the quality of the auditory-only displays? 

Q3 = How easy or difficult was is to determine the visual quality of the auditory-visual displays? 

Q4 = How easy or difficult was is to determine the auditory quality of the auditory-visual displays? 

Q5 = How easy or difficult was to determine both the auditory and visual qualities of the auditory-visual displays? 
Q6 = Would you have liked less or more time to view the visual-only displays? 

Q7 = Would you have liked less or more time to hear the auditory-only displays? 

Q8 = Would you have liked less or more time to hear-view the combined auditory-visual displays? 





Figure 107. Combined Data: Post-Experiment Questions 1 - 8. 


modality, 2) whether manipulating visual display pixel resolution or Gaussian noise level, 
3) whether manipulating auditory display sampling frequency or Gaussian noise level, or 
4) whether an auditory-visual display is tightly or loosely coupled, cross-modal auditory- 
visual perception phenomena exist. Overall, these findings strongly suggest: 


1) When attending only to the visual modality, a high-quality visual display 
coupled with either a medium- or high-quality auditory display causes an increase in the 
perception of visual quality relative to established baseline conditions derived from 
visual-only quality perception evaluations. 


2) When attending only to the auditory modality, a low-quality auditory display 
coupled with either a medium- or high-quality visual display causes a decrease in the 
perception of auditory quality relative to established baseline conditions derived from 
auditory-only quality perception evaluations. 
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3) When attending to both auditory and visual modalities, a high-quality visual 
display coupled with a low-, medium-, or high-quality auditory display causes an increase 
in the perception of visual quality relative to established baseline conditions derived from 
visual-only quality perception evaluations. 

Another finding worth mentioning, which 1s just slightly above the level of statistical 
significance set for this research, is that when attending to both auditory and visual 
modalities, a low-quality auditory display coupled with a high-quality visual display 
causes a decrease in the perception of auditory quality relative to established baseline 
conditions derived from auditory-only quality perception evaluations. 

Overall, these results provide the empirical evidence to support what most people 
in the gaming business, multimedia industry, entertainment industry, and VE community 
have suspected all along: that audio can influence the quality perception of video, and 
that video can influence the quality perception of audio. The results also indicate that 


although we can divide our attention between audition and vision, we are not consciously 


aware of potentially significant intermodality effects. 
D. IMPACT 


Because of the multi-disciplinary nature of this research effort, the impact of the 


overall findings are far reaching having both theoretical and commercial implications. 


1. Theoretical Impact 
The theoretical impact of the findings in this study are diverse. The following 
describes the impact on Sensory Interaction, Visual Dominance, Divided Attention, and 


Time-sharing. 


a. Sensory Interaction 

Because the overall findings indicate that auditory quality can influence 
visual quality perception and vice versa, some sort of sensory interaction must be taking 
place. These findings support the many conclusions outlined earlier in Chapter II, Section 


C. For example, these findings support the early intersensory research conclusions of 
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both Ryan [RYAN4O] and Gilbert [GILB41]. Also, O’Connor and Hermelin [OCON8 1] 
would argue that these findings support the concept of sensory capture. But how this 
sensory interaction occurs is still not known. Stein and Meredith [STEI93] might 
conclude that this interaction could be taking place at the neurological level based on 
single multi-modal neurons as depicted earlier in Figure 4 and Figure 5. However, 
Gibson [GIBS66] [GIBS79] might argue that this sensory interaction is based on the 


complexity of natural life events. 


b. Visual Dominance 


One of the overall findings of this research effort suggests that when 
attending only to the auditory modality, a low-quality auditory display coupled with 
either a medium- or high-quality visual display causes a decrease in the perception of 
auditory quality. The reason for degrading the perception of the auditory quality might be 
based on the concept of visual dominance discussed earlier in Chapter II, Section E and 
Chapter III, Section F. Perhaps at some higher cognitive level, the higher-quality visual 
display is being compared with the lower-quality auditory display. This unconscious 
comparison might cause one to perceive that the auditory quality is worse than it actually 


is because of the dominating nature of the visual modality. 


c. Divided Attention 

The overall findings of this research indicate that humans can effectively 
divide their attention between the auditory and visual sensory modalities. This ability to 
divide one’s attention between the auditory and visual sensory modalities supports the 


various attention theories discussed earlier in Chapter II, Section F. 


d. Time-Sharing 
Although this research supports the ability to divide attention among the 
auditory and visual sensory modalities, the time-sharing question remains: do we process 


these simultaneous auditory and visual stimuli in parallel or in serial? If the overall 


results indicate that we process simultaneous auditory and visual stimuli in serial, this 
would Icnd support the Single-Resource Theory discussed earlier in Chapter II, Section F. 
If the overall results indicate that we process simultaneous auditory and visual stimuli in 
parallel, this would lend support the Multiple-Resource Theory discussed earlier in 
Chapter I, Section F. Since 33.3% of all subjects felt that they were mentally overloaded 
when having to rate both auditory and visual displays simultaneously, one might 
conclude that these particular subjects did not have adequate time to simultaneously rate 
both auditory and visual displays in a serial manner and therefore had to process the 
simultaneous auditory and visual displays in parallel, which was mentally overloading. If 
this were true, this would lend support to the Multiple-Resource Theory. However, it is 
important to note that in this research effort, no assumptions can be made as to how the 
subjects processed the simultaneous auditory and visual stumuli. Consequently, no time- 


sharing conclusions can be made from the overall results of this research effort. 


2. Commercial Impact 


The commercial impact of the findings in this study are diverse. For example, one 
of the overall findings of this research effort suggests that when attending only to the 
visual modality, a high-quality visual display coupled with either a medium- or high- 
quality auditory display causes an increase in the overall visual quality perception of an 
auditory-visual display. Thus, suppose the fictitious company, ACME Cyber Art, sells 
contemporary paintings via the internet. ACME Cyber Art’s current web-based 
advertising only depicts photographs of the various paintings from which prospective 
customers can purchase on-line. ACME Cyber Art, however, wants to increase its sales. 
One possible strategy to increase sales, is to simply add medium- or high-quality music to 
their web page while prospective customers are looking at the various artworks. As such, 
the perceptual visual quality of the various artworks might increase relative to itself, 


thereby possibly increasing the probability that the customer will make a purchase. 
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Another finding of this research effort suggests that when attending only to the 
auditory modality, a low-quality auditory display coupled with either a medium- or high- 
quality visual display causes a decrease in the overall auditory quality perception of an 
auditory-visual display. Thus, suppose the next GRAMMY Awards were partially 
decided via internet-based votes. As such. music fans would point their web browser to 
the GRAMMY Awards web site to cast their votes. This GRAMMY web site would 
contain high-quality visual images of the various nominated musical talents. By clicking 
on the visual image of a particular musical talent, one could hear a short 15 second audio 
clip of the nominated song. In an effort to 1) decrease rendering time, 2) decrease storage 
requirements, and 3) decrease download time, suppose the GRAMMY web site designers 
decreased the sampling frequency of the audio clips from 44.1 kHz to 10 kHz. Asa 
result, to the surprise of the GRAMMY web site designers, most fans complained that the 
quality of the audio clips was very poor making it impossible to cast their votes properly. 
Consequently, the internet-based voting of the GRAMMY Awards might be a huge 
failure. 

Another finding of this research effort suggests that when attending to both 
auditory and visual modalities, a high-quality visual display coupled with a low-, 
medium- or high-quality auditory display causes an increase in the overall visual quality 
perception of an auditory-visual display. Thus, suppose a VE developer has been tasked 
’ to increase the realism (and perhaps presence) of a 3D scene depicting a typical family 
living room. The current virtual living room contains a TV and stereo system which 1s 
rendered using high-quality visual graphics. However, the living room scene does not 
have any associated sounds. Instead of increasing the pixel resolution of the living room 
scene, causing an unwanted increase in the visual rendering time of the scene, the VE 
developer adds 1) high-quality music to the stereo system, and 2) an MPEG video 
sequence containing high-quality audio to the TV display. As a result, the perceptual 
visual quality of the scene ought to increase by simply adding the associated auditory 


displays without the need to manipulate any of the visual displays. 
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These preceding examples highlight just some of the numerous possibilities 
impacted by this research effort. Overall, the findings of this research effort are indeed 
important which can greatly benefit the gaming business, multimedia industry, 


entertainment industry, VE community, and also the Internet industry. 


E. OBSERVATIONS 


The following describes some of the overall informal observations noted during 
the conduct of the main experiments. No formal data analyses are performed on the 
observations. The observations are presented in order to provide the reader with 


additional peripheral insights on the overall findings of this research effort. 


1. Response Time Measurement 


After observing 130 subjects throughout the course of the various experiments, 
the use of the rating scales to collect subject responses times is perhaps invalid. The 
reason for this stems from the physical layout of the rating scales and the functionality of 
the mouse. Since the rating scales consist of one or two horizontal set(s) of radio buttons, 
the distance between the Push to Continue button and choice number one is further than 
the distance between the Push to Continue button and choice number four. As:a result, it 
will always take a longer time to select, for example, choice numbers one and seven as 
opposed to choice number four. To alleviate this problem, all response times need to be 
normalized to establish acommon time metric among all choices. This normalization 
process is achieved through Fitts’s Law which states that “...the time to move the hand to 
a target depends only on the relative precision required, that is, the ratio between the 
target’s distance and its size” [CARD83] (see [WICK92] for more information on Fitts’s 
Law). Nevertheless, Fitts’s Law was not considered in this research effort. 

In terms of the combined rating scale, some subjects complained that the visual 
scale should have been on the top whereas others preferred the current format with the 


auditory scale on top. The functionally of the mouse and mouse pad also have an 
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undetermined effect on response time. Some subjects complained that the mouse would 
occasionally stick or slide improperly, while others did not experience any problems. 
Some subjects would keep their hands on the mouse the entire time, while others would 
place their hands in their laps, and then grab the mouse when it was time to make a 
response. On a side note, some subjects used the mouse/cursor to read all the instructions 
and also to point at salient quality features. Some subjects would also slide their cursor to 
the relative quality position of the rating scale even before the scale appeared. 
Furthermore, adept computer users are much more efficient at using the mouse as 
opposed to some one using the mouse’s point-and-click paradigm for the first time. Some 
subjects who were accustomed trackball users felt uncomfortable using the mouse. With 
all the preceding observations, the use of the rating scales in all three experiments to 
capture response time ought to be considered invalid. Therefore, as stated earlier, any 
statistical analysis of the results of the response times must keep in mind the 


aforementioned observations. 


2. Synesthesia Encounter 

After discussing the experiment with one of the female subjects, she said that 
sometimes she experienced various shades of colors when listening to classical music. 
She was not aware of all the research that has been done concerning synesthesia. It was 
very interesting to discuss synesthesia with someone who actually experiences 


synesthesia. 


3. Subjects Description and Use of the Stimuli 


Perhaps the most interesting observations were gathered from the post-experiment 
questions which asked the subjects if they focused on any particular features when 
determining quality, and if so, to describe those features. The diverse responses are 
simply amazing. This diversity stems from the various backgrounds of the subjects. For 
example, in describing a straight-line on the radio, a computer graphics programmer 


might use the term aliasing, whereas, the novice might use the term jaggedness. Also, 
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some subjects felt that it was easier to determine the auditory and visual qualities 
simultaneously because they could use the stimulus in one modality to support their 
quality decision in the other modality. The following is an excerpted compilation of the 
items focused on by the subjects and also the terms used to describe what they focused on 


when determining visual and auditory displays quality. 


a. Experiment I: Static Resolution 


Visual Display Quality Terms: 


fonts, lines at edge, patterns, straight lines, text, control knobs, frame 
around frequency window, matrix on speaker pattern, numbers on frequency 
scale, name on radio, top left edge of radio, the” on” and “off” labels, the word 
“hallicrafters” on the radio, outside edges of radio, lower speaker line, the lines 
going through the image, dial, anti-aliasing, legibility of characters, the word 
“turning,” the number “12,” the upper right-hand portion of the radio, the 
white dots on speaker pattern, contrast of radio to background, pieces of dirt on 
top of radio, highlights, grill, letters, blurring of letters and numbers, ridges on 
dial, inconsistencies of corners and the line along the backside of the radio, the 
word “continental” on the radio, reflecting light, white knob. 


Auditory Display Quality Terms: 


sense of remoteness, cymbals, the cymbals crash, compressed versus open, 
frequencies, low sounded muddy and didn’t sustain, treble, guitar, highs versus 
lows, opening highs, high was more clear, high hat on drums, frequency range, 
dynamic range, the presence of the closer sound appeared to be of better 
guality, low was muffled and high was more treble, the counter point of low 
frequency organ line, the keyboard resonance was more dynamic in the highs 
than in the lows, high sounded tinny and low quality had more base, base/treble, 
more base in high and less base in low, high was painful and low was not 
painful, qualities seemed reversed, low sounded farther back and high sounded 
farther forward, the first note, drum sound, low quality was more pleasing, high 
was more irritating, low was more damped than high, the low quality sounded 
muted, snare drums, low sounded better, clearness of music, low had less 
volume, high was more broad sounding, bass was high, the poor music reminded 
me of music inacan, the good music was a definite stereo sound. 


Combined Auditory- Visual Display Quality Terms: 


It was hard to believe that the older radio could play the newer alternative 
music, reversal of auditory and visual qualities. 


b. Experiment 2: Static Noise 
Visual Display Quality Terms: 


small print above lower right and left dials, words under frequency scale, 
numbers on frequency scale, granularity quality of background, the “on” and 
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“off swuch, name of radio, judge readability of alphanumerics, granularity of 
edges, brightness of white knob, better resolution means better quality, right 
side of radto, letters above the knobs, the word “continental,” mesh i speaker, 
reflection on front top, darkness of black, clarity of dial numbers, the amount of 
brownish distortion in black finish of radio, contrasts between light and dark, 
glare in front right top quadrant of radio, shine on top, shadows, light 
reflection, lower right-hand-corner, background static, sharpness of “on” “off” 
knob, grille holes, outlay of radio, looked Gt dots all over, fuzziness of the grid 
lines on the speaker, corners, graininess of picture, textures, haze on top and 
haze on reflection, bottom left of whole image. 


Auditory Display Quality Terms: 


piano accompaniment in the background, general level of static, clarity of 
bass, clearer is higher quality, the louder static was low quality and the lower 
static was the higher quality, differentiate the amount of Static present, loudness 
of static versus loudness of audio signal, hiss level, bass tones, the crispness of 
the music, the frequency pitch of the static background noise, amount of snow/ 
interference, white noise level, amount of feedback, scratchiness, the frequency 
of static, level of noise, percent of volume taken up by noise, the loudness of the 
background rain, treble. 


Combined Auditory-Visual Display Quality Terms: 
sometimes reversed auditory and visual qualities. 


c. Experiment 3: Static Resolution Nonalphanumeric 


Visual Display Quality Terms: 


pixellation on lower leaf, outline of apple and fruit on the plate, upper edge 
of apple, right side of leaf on table, bottom edge of red rose, flowers, carpet, 
texture, shadowing, fruit skin, the roses, peach, pear, looking for continuous 
lines, clarity of black spot on pear, weave of cloth, rose petals, smoothness of 
apples, the overall colors, the brighter the better the quality, blade of grass in 
lower left corner, curved edges and color blends, the contrast with the yellow 
and red roses, looked at cleaner images, pink rose petals, hard edges, the pixels. 


Auditory Display Quality Terms: 


high-end tenor quality, high frequencies, low quality sounded as though it 
was played in a box, mushing sound for low quality, more pinging for high 
quality, tone increased with high quality sound, low quality has a deeper tone, 
high was tinny, the low was hollow sounding, the high was sharper, the chimes 
sounded muted and the high was full and loud, high quality had higher notes, 
bass was muffled and high had crisp cymbals, more bass neans better quality, 
range of tones, muffling of resonance, equality of left and right ears, hissing or 
lack thereof in the background, low end fidelity and range of sound, things I 
could not express, tonal quality, clearness of bass, the higher pitched instrument 
conung through clearer, one ts clear, the other is distant, the guitar in the back, 
loudness of the shower, brush strokes for the cymbals, the peaks, the more the 
instruments the more the quality. 
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Combined Auditory-Visual Display Quality Terms: 
The bowl of fruit does not mix well with the choice of music. The choice of 
music should have been classical, reversal of audio and visual qualities, 
drumbeat and treble, the more the bass the better the quality, 
4. Reversals 
A very common response from the subjects was that they sometimes felt they may 
have reversed the rating of auditory and visual qualities. This auditory-visual dyslexia 


may be attributed to some of the findings concerning auditory-visual cross-modal 


perception. 


5. Recognizable Quality Levels 

Upon completion of the experiment. some subjects were astonished when they 
were told that only three levels of auditory and visual stimuli were utilized. Their 
astonishment 1s probably attributed to the number of choices on the rating scales (seven). 
Thus, subjects may have been anticipating seven levels of quality, and as a result 


conformed (perceptually) to accepting seven quality levels. 


F. RECOMMENDATIONS 


1. Recruiting Subjects 


The recruiting of volunteer subjects took much longer time to accomplish than 
originally planned. One should anticipate allocating more time to recruit subjects than the 


total amount of time to actually test subjects. 


2. Statistical Analysis Package 


Because the statistical analysis software package was chosen well in advance of 
collecting data, as well as mastering its use, the data analysis portion was accomplished 


with much greater ease. 


190 


3. Hardware and Software Platform 


Because of the immense amount of time and data lost due to hardware and 
software related issues during the experimental design phase of this research effort, it is 
crucial to insure the reliability and usability of all chosen hardware and software as early 


as possible in the design phase. 


4. Downloaded Software 


The use of all the freely downloaded software used in this effort greatly facilitated 
the software development of the main experiments, since the experimenter merely has to 
download the software and start developing. There is no need to waste time venturing out 
to the computer software store. Furthermore, since the software 1s free, precious research 


funding can be used for other things such as hardware. 


5. Photoshop and SoundForge 


This research would not have been possible without the software to create the 
various visual and auditory displays. Adobe Photoshop [ADOB98] and Sonic Foundary’s 
SoundForge |SONI98] proved to be outstanding software packages and their use 1s 


highly recommended. 


6. Visual Dominance 


It is interesting to note, that because this dissertation is a written document, only 
the visual stimuli can be presented to the reader which 1s evident by the numerous 
figures. The auditory stimuli can only be imagined. Thus, the reader has a much better 
understanding of the visual stimuli, but not the auditory stimuli. Is this not another 
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example of visual dominance? 
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G. FUTURE WORK 


1. Choice of Quality Parameters and Stimuli 

Since pixel resolution, Gaussian noise level, and sampling frequency were the 
only quality parameters manipulated, the use of other quality metrics 1s warranted. 
Furthermore, the effects from using various other stimuli, such as motion video and 3D 
VEs are also needed. As such, a greater scope of potential auditory-visual perception 
phenomena can be investigated. 

One possible scenario using a VE might first include the process of having 
subjects watch a virtual person (in 3D space) place a radio (playing music) on a table. 
After this initial process of watching the virtual radio being placed (dynamically) on the 
virtual table, subjects might perceive a stronger perceptual grouping between the radio 
(visual) and music (audio) through increased temporal and spatial synchronization, 
thereby decreasing the cognitive distance between the radio (visual) and music (radio). 
As a result, 1f the same experiments outlined in this dissertation were then conducted 
after this initial process, the overall] results might indicate an increase in statistically - 


significant auditory-visual cross-modal perception phenomena. 


2. Auditory-Visual Quantitative Perceptual Model 


Given that auditory-visual cross-modal perception phenomena exist, the next 
logical step 1s to’incorporate these overall findings into some type of useful auditory- 
visual quantitative perceptual model (similar to that proposed by Hollier and Voelcker 
[HOLL97] as depicted earlier in Figure 29). This model can then be used to derive 
appropriate (quantitative) levels of auditory and visual fidelity for use by developers in 
the gaming business, multimedia industry, entertainment industry, VE community, and 
the Internet industry, etc. For example, given a certain application, this auditory-visual 


quantitative perceptual model could help to derive the appropriate levels and specific 
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amounts of visual display pixel resolution and auditory display sampling frequency as a 


function of visual-only, auditory-only, and/or combined auditory-visual media. 


3. Intersensory Research 


The exhaustive literature review and results of this research effort make it clear 
that in order to better understand the proper use of multisensory stimuli, more research 
emphasis needs to be placed on investigating intersensory phenomena. This increased 
emphasis need not be limited to auditory-visual] interactions but ought to include 


investigating auditory-visual-haptic interactions. 


4. On-line Experiments 


Because of the potential to easily acquire many (perhaps thousands) subjects, the 
use of on-line experiments can greatly facilitate scientific research. As such, all the 
experiments contained in this research effort can be used: on-line. However, on-line 
experiments make it difficult to control the conditions of the experiment (1.e., hardware 
‘specifications, proper subject participation, environmental conditions. etc.). Being able to 
control the conditions 1s vital when conducting experiments. Nevertheless, a first attempt 
has been made towards conducting on-line experiments which can hopefully be used 


toward future on-line research. 


H. FINAL THOUGHTS 


It is hoped that this dissertation will help to bridge the current multi-disciplinary 
gap among multimedia and VE developers. Furthermore, this dissertation is intended to 
become the key reference that researchers need to read before attempting to evaluate 


multi-modal perceptual effects in combined auditory and visual displays. 
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APPENDIX D. INTERNET RESOURCES 


The first section of this appendix contains the URL’s of some research institutions 
which are currently doing research in various aspects of sound. The second section 


contains the URL’s of various sound related commercial products. 
Auditory Perception Lab, Dept. of Psychology, University of California, 
Berkeley: http://ear.berkeley.edu/auditory_lab/ 


Center for Computer Research in Music and Acoustics (CCRMA), Dept. of 
Music, Stanford University: http://ccrma-www.stanford.edu/Welcome.html 


Center for Experimental Music and Intermedia (CEMI), University of North 
Texas: http:/Avww.scs.unt.edu/cemi/cemi.him 


Center for New Music and Audio Technologies (CNMAT), University of 
California, Berkeley: http:/Avww.cnmat.berkeley.edu/ | 


Center for Research in Computing and the Arts (CRCA), University of 
California, San Diego: http://crca-www.ucsd.edu 


Center for Research in Electronic Art Technology (CREATE), Dept. of Music, 
University of California, Santa Barbara: http:/Avww.ccmrc.ucsb.edu/ 


Center for Studies in Music Technology (CSMT), Yale University: http:// 


www.music.vale.edu:/ 


Dipartimento di Ingegneria Industriale, University of Parma, Angelo Farina: 
http://pcfarina.eng.unipr.1t/ 


Faculty of Music, McGill University, Montréal: hitp:/Avww.music.mcgill.ca/ 


Graphics, Visualization, and Usability Center, Georgia Tech: http-// 
www.cc.gatech. edu/gvu/multimedia/ 


Harvard Computer Music Center, Harvard University: http:/Avww- 
mario.harvard.edu 


Hearing Development Research Laboratory (HDRL), Waisman Center, 
University of Wisconsin: http:/Awww.waisman.wisc.edu/hdrl/ 
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Human Interface Technology Lab (HIT LAB), University of Washington: http:// 
wiw.fitl washington.edu/ 


Human Research and Engineering Directorate (HRED), Army Research 
Laboratory: http:/Avww.arl.mil/ARL-Directorates/HRED/hred.html 


Image Synthesis Group, Dept. of Computer Science, Trinity College, Dublin: 
http:/vangogh.cs.tcd.ie\ 


Institut de Recherche et Coordination Acoustique/Musique (IRCAM), Institute 
for Acoustic/Music Research: hittp:/Avww.ircam.fr 


Interval Research Corporation, Palo Alto, California: http://www. interval.com 


Laboratory of Acoustics and Audio Signal Processing, Helsinki University of 
Technology (HUT): http-/Avww. hut fi/HUT/Acoustics/index. html 


Machine Listening Group, MIT Media Lab, Massachusetts Institute of 
Technology: http://sound.media.mit.edu/ 


National Center for Supercomputing Applications (NCSA), University of 
[linois at Urbana-Champaign: http://www.ncsa.uiuc.edu/ 


NASA Ames Research Center, Moffett Field, California: http:// 
www.arc.nasa.gov/ 


NAVE Research Group, Dept. of Computer Science, University of Colorado at 
Boulder: http:/Awww.cs.colorado.edu/~cboyd/ 


Norwegian network for Technology, Acoustics and Music (NoTAM), University 
of Oslo: http://www.notam.uio.no/index-e.html 


Parmly Hearing Institute, Loyola University Chicago: http://parmly-2.ls.luc.edu/ 
parmly/ 


Princeton Sound Kitchen, Princeton University: http:// 
www.iusic. princeton.edu: 80/PS K/ 


SCCP Virtual Reality SOUND, University of Aizu: hitp:/Avwww-ci.u-aizu.ac.jp/ 
VirtualReality/WWW/sound.html 


Sound Localization Research, San Jose University: http:/www-engr.sjsu.edu/ 
~duda/Duda.Research.html 
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Visual Systems Laboratory. University of Central Florida: http:// 


www. vsListuchedu/ 


The WORLDSONG Project: http:/Avww.hyperreal.com/~mpesce/ 


worldsong. html 


York University Music Technology Group, The University of York: hittp:// 
www. vork.ac.uk/nst/mustech/sd_audio/ambison.htn 
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This section contains the URL’s of various sound related commercial products. 


AdB International Corporation: hittp:/Awww.adbdigital.com/ 
Aureal Semiconductor: http:/Avww.aureal.com 

The Binaural Source: hitp:/Avww.btown.com/binaural. html 
CATT: hitp:/Avww.netg.se/~catt/ 

Chromatic Research: http:/Awww.chromatic.com/ 

Circle Surround: HED OManrsUr rotate t/ 

Creative Labs: http:/Awww.creaf.com/ 

Crystal River Engineering: http:/Awww.cre.com/index.himl 
DirectSound Xtra: http://www.directxtras.com/ds_home.htm 
Dolby Laboratories: http://Awww.dolby.com/ 

E-mu Systems Inc.: http:-/Avww.emiu.com/ 

Ensonig Corporation: http:/Avww.ensoniq.com/ 

Firsthand: http-/Avww.firsthand.com/ 

HeadRoom: /nttp:/headroom.headphone.com/ 

Headspace: /ittp://www.headspace.com 

HoonTech: hittp:-/Avww.hoontech.co.kr/hoontech_eng.html 


Lake DSP: hitp:/Avww.lakedsp.com/ 


247 


Level Control Systems: hitp:/Avww.lcsaudio.com/cs.html 

Lexicon: Attp:-/Avww.lexicon.com/ 

MIDI Home Page: hittp:/Avww.eeb.ele.tue.nlfnidiAndex.html 

MIDI Manufacturers Association: hitp:/Avww2.midi.org/mma/ 

Muscle Fish: http:/Avww.musclefish.com/ 

NuReality: Attp:/Avww.nureality.com/ 

Paradigm Simulation Inc.: Attp:/Avww.paradigmsim.com/ 

Pyramid Systems: hitp:/Amgweb.conm/psi/ 

Qsound: http:/Avww.qsound.ca/ 

RealAudio: http:/www.real.com/ 

Reality by Design, Inc.: http:/Avww.rbd.com/ 

Realistic Sound Experience (RSX) Technology: /ittp:/Avww.intel.com/fal/rsx/ 
Roland Sound Space: http:/Avww.rolandcorp.com/products/PA/RSS-10.html 
SENSES: http:/www.sense8.com/ . 

Sound Retrieval System (SRS): hittp://www.srslabs.com/ 


Sony IMAX Theatre: ittp:/Avww.spe.sony.com/Pictures/sonytheatres/imax/ 
imaxtech.html 


Spatializer Audio Laboratories: http:/Avww.catalog.com/cgibin/var/3dstereo/ 
index.html 


Symbolic Sound Corporation: hitp:/Awww.SymbolicSound.com/ 
THX: http://www. thx.com/ | 
Tucker-Davis Technologies Inc.: http://tdt-quikki.com/ 


Unofficial SGI Audio Apps List: hitp://reality.sgi.com/employees/cook/ 
audio.apps/ 


Virtual Audio Imager (VAI): http://www. purestereo.com/brown.html 


Visual Synthesis Incorporated (VSD: http:/Awww.vsicorp.com/ 
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